[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955157#comment-14955157
 ] 

Kihwal Lee commented on HDFS-8676:
----------------------------------

We have come to know that this bug not only causes heartbeat expiration, but 
fails writes. Since the deletion is executed by the actor thread synchronously, 
incremental block reports are blocked while deletion is in progress. Flie 
closures or adding blocks fail, if deletion takes a long time.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> -------------------------------------------------------------------
>
>                 Key: HDFS-8676
>                 URL: https://issues.apache.org/jira/browse/HDFS-8676
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Walter Su
>            Priority: Critical
>         Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to