[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482454#comment-15482454
 ] 

Arpit Agarwal commented on HDFS-10301:
--------------------------------------

IIUC we need to fix this logic not just for pruning storages but also deciding 
when to remove the block report lease.

>From BPServiceActor.java, we can assume at line 399 that the storage report 
>just sent was processed successfully by the NameNode. i.e. DataNode getting 
>back success is sufficient to conclude the report was successfully processed.
{code}
 393         for (int r = 0; r < reports.length; r++) {
 394           StorageBlockReport singleReport[] = { reports[r] };
 395           DatanodeCommand cmd = bpNamenode.blockReport(
 396               bpRegistration, bpos.getBlockPoolId(), singleReport,
 397               new BlockReportContext(reports.length, r, reportId,
 398                   fullBrLeaseId, true));
 399           blockReportSizes.add(
 400               calculateBlockReportPBSize(useBlocksBuffer, singleReport));
 401           numReportsSent++;
 402           numRPCs++;
 403           if (cmd != null) {
 404             cmds.add(cmd);
 405           }
{code}

The DN can include a flag in the last RPC message i.e. when {{r == 
reports.length - 1}} that tells the NameNode it is the last report in this 
batch and all previous ones were successfully processed. So it's safe to drop 
the lease and prune zombies.

Also +1 for [~daryn]'s idea to ban single-RPC reports, as this approach cannot 
be used for single-RPC reports.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Vinitha Reddy Gankidi
>            Priority: Critical
>             Fix For: 2.7.4
>
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to