Todd Lipcon created HDFS-3914:
---------------------------------
Summary: QJM: acceptRecovery should abort current segment
Key: HDFS-3914
URL: https://issues.apache.org/jira/browse/HDFS-3914
Project: Hadoop HDFS
Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Found this bug with randomized testing. The following sequence causes a problem:
- JN writing segment starting at txid 1, and successfully wrote txid 1, but no
more
- JN becomes partitioned from NN, and a new NN takes over
- new NN is also partitioned for the "prepareRecovery" phase of recovery, but
properly connects for the "acceptRecovery" call
- acceptRecovery copies over a longer log segment (eg txns 1-3) from a good
logger
- new NN calls finalizeLogSegment(), but gets the following error:
JournalOutOfSyncException: Trying to finalize in-progress log segment 1 to end
at txid 3 but only written up to txid 1
This is because the "syncLog" call (which copies the new segment) isn't
properly aborting the old segment before replacing it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira