[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299313#comment-15299313 ]
Konstantin Shvachko commented on HDFS-10301: -------------------------------------------- Hey Colin, let's decide on the way to move forward. I do not see a point in making this change in two steps. * Your changes will essentially be completely removed by Vinitha's patch. * I do not see her patch introducing incompatible changes. So it can and should be backported through to branch 2.6. A thorough review is needed and will be quite helpful. I think the [004 patch|https://issues.apache.org/jira/secure/attachment/12805798/HDFS-10301.004.patch] covers * the upgrade case, that is, it works consistently for both old (pre-patch) and new (patched) DataNodes block reports * the case when the entire block report is sent in a single RPC and * the case when block reports are split into multiple RPCs * the leases So apart from the failed test I do not see any issues. It would be good if you could take a fresh look, see if any corner cases were missed. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.1 > Reporter: Konstantin Shvachko > Assignee: Colin Patrick McCabe > Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org