[ https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181031#comment-13181031 ]
Todd Lipcon commented on HDFS-2709: ----------------------------------- I'm skeptical of the fix -- the question is _why_ we see the wrong log version here. We investigated and it looks like there's a race when a log file is created -- it preallocates the file with all 0xFFFFFFFF, and then it goes back and writes the version number. Adding a sleep() after the preallocate() call in EditLogFileOutputStream triggers this reliably. So, I think we should file another JIRA to fix that race. Separately, I agree that we should probably change this to be an exception instead of assert. But I think LogHeaderCorruptException is probably a better choice. > HA: Appropriately handle error conditions in EditLogTailer > ---------------------------------------------------------- > > Key: HDFS-2709 > URL: https://issues.apache.org/jira/browse/HDFS-2709 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Todd Lipcon > Assignee: Aaron T. Myers > Priority: Critical > Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, > HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, > HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, > HDFS-2709-HDFS-1623.patch > > > Currently if the edit log tailer experiences an error replaying edits in the > middle of a file, it will go back to retrying from the beginning of the file > on the next tailing iteration. This is incorrect since many of the edits will > have already been replayed, and not all edits are idempotent. > Instead, we either need to (a) support reading from the middle of a finalized > file (ie skip those edits already applied), or (b) abort the standby if it > hits an error while tailing. If "a" isn't simple, let's do "b" for now and > come back to 'a' later since this is a rare circumstance and better to abort > than be incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira