[ 
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052970#comment-13052970
 ] 

Todd Lipcon commented on HDFS-2093:
-----------------------------------

bq. doTestCrashRecoveryEmptyLog assumes the cluster should not start, even if 
just one of the dirs has a corrupted log, shouldn't the cluster start as long 
as only one of the in progress logs was truncated?

The two different variants of this test are:
a) inBothDirs=false:
- one dir has edits_1-2 and edits_inprogress_3 truncated
- the other dir just has edits_1-2

b) inBothDirs=true:
- both dirs have edits_1-2 and edits_inprogress_3 truncated

In the first case, it should fail because it can tell that it was an unclean 
shutdown, since there is a log starting at txid 3 (even though it's corrupt).
In the second case, it fails because it has two logs, both truncated.

I guess the comments on the test cases aren't clear. I'll improve those, and 
also address the nit, and upload a new patch.


> 1073: Handle case where an entirely empty log is left during NN crash
> ---------------------------------------------------------------------
>
>                 Key: HDFS-2093
>                 URL: https://issues.apache.org/jira/browse/HDFS-2093
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-2093.txt, hdfs-2093.txt, hdfs-2093.txt
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it 
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there 
> are two logs starting with the same txid

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to