[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482454#comment-15482454 ]
Arpit Agarwal commented on HDFS-10301: -------------------------------------- IIUC we need to fix this logic not just for pruning storages but also deciding when to remove the block report lease. >From BPServiceActor.java, we can assume at line 399 that the storage report >just sent was processed successfully by the NameNode. i.e. DataNode getting >back success is sufficient to conclude the report was successfully processed. {code} 393 for (int r = 0; r < reports.length; r++) { 394 StorageBlockReport singleReport[] = { reports[r] }; 395 DatanodeCommand cmd = bpNamenode.blockReport( 396 bpRegistration, bpos.getBlockPoolId(), singleReport, 397 new BlockReportContext(reports.length, r, reportId, 398 fullBrLeaseId, true)); 399 blockReportSizes.add( 400 calculateBlockReportPBSize(useBlocksBuffer, singleReport)); 401 numReportsSent++; 402 numRPCs++; 403 if (cmd != null) { 404 cmds.add(cmd); 405 } {code} The DN can include a flag in the last RPC message i.e. when {{r == reports.length - 1}} that tells the NameNode it is the last report in this batch and all previous ones were successfully processed. So it's safe to drop the lease and prune zombies. Also +1 for [~daryn]'s idea to ban single-RPC reports, as this approach cannot be used for single-RPC reports. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.1 > Reporter: Konstantin Shvachko > Assignee: Vinitha Reddy Gankidi > Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, > HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org