[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

Konstantin Shvachko (JIRA) Fri, 15 Jul 2016 16:42:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380310#comment-15380310
 ]


Konstantin Shvachko commented on HDFS-10301:
--------------------------------------------

Reviewed latest patch. Got a few nits:
# In {{BlockManager.removeZombieStorages()}} you should add a check {{if(node 
== null)}}. The node could have been deleted while we were not holding 
{{writeLock}}.
# {{DatanodeDescriptor.removeZombieStorages()}} methods does not need to be 
public. Should be package private.
# Remove empty line change in {{BPServiceActor.blockReport()}}.
Also the comment here is confusing. You might want to clarify it.
# checkstyle warning tells that either {{STORAGE_REPORT}} should be declared 
{{final}} or it should not be all-capital. I think {{final}} makes sense.

Also I think that [~cmccabe]'s veto, formulated as
??I am -1 on a patch which adds extra RPCs.??
is fully addressed now. The storage report was added to the last RPC 
representing a single block report. The last patch does not add extra RPCs.
So I plan to commit this three days from today. Given of course the nits above 
are fixed.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Vinitha Reddy Gankidi
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

Reply via email to