[ 
https://issues.apache.org/jira/browse/HDFS-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1260:
------------------------------

    Attachment: simultaneous-recoveries.txt

After months of running this test I ran into this failure attached above. One 
of the DNs somehow ends up with multiple meta files for the same block, but at 
different generation stamps.

I think the issue is in the implementation of DataNode.updateBlock(). The block 
passed in doesn't have a wildcard generation stamp, but we don't care - we go 
and find the block on disk without matching generation stamps. I think this is 
OK based on the validation logic - we still only move blocks forward in 
GS-time, and don't revert length. However, when we then call updateBlockMap() 
it doesn't use a wildcard generation stamp, so the block can get left in the 
block map with the old generation stamp. This inconsistency I think cascades 
into the sort of failure seen in the attached log.

I *think* the solution is:
  - Change updateBlock to call updateBlockMap with a wildcard generation stamp 
key
  - Change the interruption code to use a wildcard GS block when interrupting 
concurrent writers

I will make these changes and see if the rest of the unit tests still pass, 
then see if I can come up with a regression test.

> 0.20: Block lost when multiple DNs trying to recover it to different genstamps
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-1260
>                 URL: https://issues.apache.org/jira/browse/HDFS-1260
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20-append
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.20-append
>
>         Attachments: hdfs-1260.txt, hdfs-1260.txt, simultaneous-recoveries.txt
>
>
> Saw this issue on a cluster where some ops people were doing network changes 
> without shutting down DNs first. So, recovery ended up getting started at 
> multiple different DNs at the same time, and some race condition occurred 
> that caused a block to get permanently stuck in recovery mode. What seems to 
> have happened is the following:
> - FSDataset.tryUpdateBlock called with old genstamp 7091, new genstamp 7094, 
> while the block in the volumeMap (and on filesystem) was genstamp 7093
> - we find the block file and meta file based on block ID only, without 
> comparing gen stamp
> - we rename the meta file to the new genstamp _7094
> - in updateBlockMap, we do comparison in the volumeMap by oldblock *without* 
> wildcard GS, so it does *not* update volumeMap
> - validateBlockMetaData now fails with "blk_7739687463244048122_7094 does not 
> exist in blocks map"
> After this point, all future recovery attempts to that node fail in 
> getBlockMetaDataInfo, since it finds the _7094 gen stamp in getStoredBlock 
> (since the meta file got renamed above) and then fails since _7094 isn't in 
> volumeMap in validateBlockMetadata
> Making a unit test for this is probably going to be difficult, but doable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to