[ 
https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-4288:
------------------------------

    Attachment: HDFS-4288.branch-23.patch

I think the second issue I mentioned regarding a bounced DN's BR not being 
processed can be solved have updateRegInfo reset the flag that short-circuits 
safemode BR processing.  I originally tried something that tracked the 
timestamp of the registration but I think this is much simpler.  It'll be 
trivial to tweak the patch for the other branches.

Aaron, if this is a reasonable fix, would you please help write some unit 
tests?  I'm having difficulty figuring out how to introduce a mock, or how to 
manipulate a mini-cluster to force the sequence of events to reproduce (ie. 
sync out a few blocks, stop NN, finalize last block, bring NN up in safemode 
and trick it into staying in safemode, ensure block update is received followed 
by block report, ensure block manager knows of all blocks; stop dn, remove 
blocks, re-reg in safemode and ensure NN forgets the removed blocks).  Plus I'm 
at a conference and don't have many cycles.
                
> NN accepts incremental BR as IBR in safemode
> --------------------------------------------
>
>                 Key: HDFS-4288
>                 URL: https://issues.apache.org/jira/browse/HDFS-4288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-4288.branch-23.patch
>
>
> If a DN is ready to send an incremental BR and the NN goes down, the DN will 
> repeatedly try to reconnect.  The NN will then process the DN's incremental 
> BR as an initial BR.  The NN now thinks the DN has only a few blocks, and 
> will ignore all subsequent BRs from that DN until out of safemode -- which it 
> may never do because of all the "missing" blocks on the affected DNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to