[
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052970#comment-13052970
]
Todd Lipcon commented on HDFS-2093:
-----------------------------------
bq. doTestCrashRecoveryEmptyLog assumes the cluster should not start, even if
just one of the dirs has a corrupted log, shouldn't the cluster start as long
as only one of the in progress logs was truncated?
The two different variants of this test are:
a) inBothDirs=false:
- one dir has edits_1-2 and edits_inprogress_3 truncated
- the other dir just has edits_1-2
b) inBothDirs=true:
- both dirs have edits_1-2 and edits_inprogress_3 truncated
In the first case, it should fail because it can tell that it was an unclean
shutdown, since there is a log starting at txid 3 (even though it's corrupt).
In the second case, it fails because it has two logs, both truncated.
I guess the comments on the test cases aren't clear. I'll improve those, and
also address the nit, and upload a new patch.
> 1073: Handle case where an entirely empty log is left during NN crash
> ---------------------------------------------------------------------
>
> Key: HDFS-2093
> URL: https://issues.apache.org/jira/browse/HDFS-2093
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Affects Versions: Edit log branch (HDFS-1073)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2093.txt, hdfs-2093.txt, hdfs-2093.txt
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there
> are two logs starting with the same txid
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira