[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380310#comment-15380310 ]
Konstantin Shvachko commented on HDFS-10301: -------------------------------------------- Reviewed latest patch. Got a few nits: # In {{BlockManager.removeZombieStorages()}} you should add a check {{if(node == null)}}. The node could have been deleted while we were not holding {{writeLock}}. # {{DatanodeDescriptor.removeZombieStorages()}} methods does not need to be public. Should be package private. # Remove empty line change in {{BPServiceActor.blockReport()}}. Also the comment here is confusing. You might want to clarify it. # checkstyle warning tells that either {{STORAGE_REPORT}} should be declared {{final}} or it should not be all-capital. I think {{final}} makes sense. Also I think that [~cmccabe]'s veto, formulated as ??I am -1 on a patch which adds extra RPCs.?? is fully addressed now. The storage report was added to the last RPC representing a single block report. The last patch does not add extra RPCs. So I plan to commit this three days from today. Given of course the nits above are fixed. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.1 > Reporter: Konstantin Shvachko > Assignee: Vinitha Reddy Gankidi > Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org