[ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218881#comment-13218881
 ] 

Eli Collins commented on HDFS-3004:
-----------------------------------

Hey Colin,

Nice writeup.   Worth mentioning the focus is the edits logs since we've never 
seen a corrupt image (and it now has an associated checksum). Also worth 
mentioning common cases we've seen where this happens. Eg NN disk volume fills 
up (HDFS-1594), multiple 2NNs running (HDFS-2305), buggy fsync, buggy disk 
firmware.

Per suresh, there are two core cases. I'd start with something simple:
1. If the last edit is corrupted indicate this and discard it
2. If an intermediate edit is corrupt (rare, but we've seen this multiple 
times) then skip it and keep going. Either a bunch (configurable amount) of 
subsequent edits will fail to apply, in which case we can stop at the offset we 
skipped (and truncate), or we'll finish loading.

At the end of both of these we'll have a namespace that we can try to load and 
and admin can poke  around on (to examine the metadata).

Per Todd I'd make it run interactively and by default it just displays info, 
with Y/N/YA/NA options.

Once this is working we can get fancier on the individual edits (eg try to 
fixup instead of skip), but seems like we'd get 90% of the value if we just did 
the above.

Thanks,
Eli
                
> Create Offline NameNode recovery tool
> -------------------------------------
>
>                 Key: HDFS-3004
>                 URL: https://issues.apache.org/jira/browse/HDFS-3004
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: tools
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3004__namenode_recovery_tool.txt
>
>
> We've been talking about creating a tool which can process NameNode edit logs 
> and image files offline.
> This tool would be similar to a fsck for a conventional filesystem.  It would 
> detect inconsistencies and malformed data.  In cases where it was possible, 
> and the operator asked for it, it would try to correct the inconsistency.
> It's probably better to call this "nameNodeRecovery" or similar, rather than 
> "fsck," since we already have a separate and unrelated mechanism which we 
> refer to as fsck.
> The use case here is that the NameNode data is corrupt for some reason, and 
> we want to fix it.  Obviously, we would prefer never to get in this case.  In 
> a perfect world, we never would.  However, bad data on disk can happen from 
> time to time, because of hardware errors or misconfigurations.  In the past 
> we have had to correct it manually, which is time-consuming and which can 
> result in downtime.
> I would like to reuse as much code as possible from the NameNode in this 
> tool.  Hopefully, the effort that is spent developing this will also make the 
> NameNode editLog and image processing even more robust than it already is.
> Another approach that we have discussed is NOT having an offline tool, but 
> just having a switch supplied to the NameNode, like "—auto-fix" or 
> "—force-fix".  In that case, the NameNode would attempt to "guess" when data 
> was missing or incomplete in the EditLog or Image-- rather than aborting as 
> it does now.  Like the proposed fsck tool, this switch could be used to get 
> users back on their feet quickly after a problem developed.  I am not in 
> favor of this approach, because there is a danger that users could supply 
> this flag in cases where it is not appropriate.  This risk does not exist for 
> an offline fsck tool, since it would have to be run explicitly.  However, I 
> wanted to mention this proposal here for completeness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to