[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453222#comment-13453222
 ] 

Colin Patrick McCabe commented on HDFS-3540:
--------------------------------------------

bq. From what I understand based on previous comments, it allows an operator to 
continue with corrupt editlog or abort. Not sure if abort is really a choice. 
What would one do after abort?

The first thing to try is moving aside the edit log directory that had the 
problem and seeing if you can reload with another one of the directories.  If 
it's a random I/O corruption, normally only one of the copies of the edit log 
stored on disk will be bad.  Since there's no edit log failover in branch-1, 
you have to do it yourself.  If all the copies are corrupt, it may be necessary 
to use a hex editor on the edit log, or a similar technique.  The offset of the 
failure is provided so you can check it out manually.

bq. Perhaps we should consider printing more information during recovery to 
help an admin understand the state of the editlog. Is that possible?

Nicholas mentioned earlier that it might be helpful to print out how many bytes 
are left in the log-- even though this can be computed from the information 
provided, it could be helpful to be more explicit about it.

There may be other information that can be printed out too-- I'll take a look.
                
> Further improvement on recovery mode and edit log toleration in branch-1
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3540
>                 URL: https://issues.apache.org/jira/browse/HDFS-3540
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.2.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to