[ 
https://issues.apache.org/jira/browse/HDFS-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409860#comment-16409860
 ] 

Daryn Sharp commented on HDFS-13314:
------------------------------------

bq. Is there anything you suggest doing differently?

Yes, no config option.  Detected corruption = unconditional hard stop.

bq. Once we get to this point, the metadata is already corrupt. Writing out a 
new FsImage doesn't make it any worse because replaying the prior image and 
edits would lead to the same state.

The in-memory state is corrupt but the edit stream (hopefully) isn't.  Which is 
easier to do: Hack up the NN to attempt to load the bad image?  Or replay a 
partial edit stream perhaps w/o the snapshot removal?  If you agree to the 
latter, then as Rushabh pointed out, _not_ halting the NN risks removing the 
only good image.  The defaults allow at most 2 hours (2 images retained, 1h 
checkpoint interval unless max edits exceeded).

Running in the corrupted state risks data loss.  As cited above, the original 
report of this bug resulted in the NN causing *9300 missing blocks*.

> NameNode should optionally exit if it detects FsImage corruption
> ----------------------------------------------------------------
>
>                 Key: HDFS-13314
>                 URL: https://issues.apache.org/jira/browse/HDFS-13314
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>            Priority: Major
>         Attachments: HDFS-13314.01.patch, HDFS-13314.02.patch, 
> HDFS-13314.03.patch, HDFS-13314.04.patch
>
>
> The NameNode should optionally exit after writing an FsImage if it detects 
> the following kinds of corruptions:
> # INodeReference pointing to non-existent INode
> # Duplicate entries in snapshot deleted diff list.
> This behavior is controlled via an undocumented configuration setting, and 
> disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to