Todd Lipcon created HDFS-3914:
---------------------------------

             Summary: QJM: acceptRecovery should abort current segment
                 Key: HDFS-3914
                 URL: https://issues.apache.org/jira/browse/HDFS-3914
             Project: Hadoop HDFS
          Issue Type: Sub-task
    Affects Versions: QuorumJournalManager (HDFS-3077)
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


Found this bug with randomized testing. The following sequence causes a problem:

- JN writing segment starting at txid 1, and successfully wrote txid 1, but no 
more
- JN becomes partitioned from NN, and a new NN takes over
- new NN is also partitioned for the "prepareRecovery" phase of recovery, but 
properly connects for the "acceptRecovery" call
- acceptRecovery copies over a longer log segment (eg txns 1-3) from a good 
logger
- new NN calls finalizeLogSegment(), but gets the following error: 
JournalOutOfSyncException: Trying to finalize in-progress log segment 1 to end 
at txid 3 but only written up to txid 1

This is because the "syncLog" call (which copies the new segment) isn't 
properly aborting the old segment before replacing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to