[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Walter Su updated HDFS-10301: ----------------------------- Attachment: HDFS-10301.01.patch > Blocks removed by thousands due to falsely detected zombie storages > ------------------------------------------------------------------- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.1 > Reporter: Konstantin Shvachko > Priority: Critical > Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)