[ 
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2093:
------------------------------

    Attachment: hdfs-2093.txt

Attached patch considers such logs as corrupt at startup time. Thus in the 
situation above, where the only log we have is this corrupted one, it will 
refuse to let the NN start, with a nice message explaining that the logs 
starting at this txid are corrupt with no txns. The operator can then 
double-check whether a different storage drive which possibly went missing 
might have better logs, etc, before starting NN.

> 1073: Handle case where an entirely empty log is left during NN crash
> ---------------------------------------------------------------------
>
>                 Key: HDFS-2093
>                 URL: https://issues.apache.org/jira/browse/HDFS-2093
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-2093.txt
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it 
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there 
> are two logs starting with the same txid

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to