[ 
https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531213#comment-13531213
 ] 

Daryn Sharp commented on HDFS-4288:
-----------------------------------

Non-23 branches appear to have a rudimentary fix for this issue.  However it 
appears (haven't tested) that if a node registers in safe mode with 500 blocks, 
then bounces and re-registers with only 100 blocks because maybe a disk died, 
the NN will ignore that BR if it's still in safemode.  The NN may come out of 
safe mode w/o realizing it doesn't really have all the replicas it needs, and 
probably won't issue replications until the next full BR is sent which 
increases the possibility of data loss.
                
> NN accepts incremental BR as IBR in safemode
> --------------------------------------------
>
>                 Key: HDFS-4288
>                 URL: https://issues.apache.org/jira/browse/HDFS-4288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>
> If a DN is ready to send an incremental BR and the NN goes down, the DN will 
> repeatedly try to reconnect.  The NN will then process the DN's incremental 
> BR as an initial BR.  The NN now thinks the DN has only a few blocks, and 
> will ignore all subsequent BRs from that DN until out of safemode -- which it 
> may never do because of all the "missing" blocks on the affected DNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to