[ https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955197#comment-14955197 ]
Hudson commented on HDFS-8676: ------------------------------ FAILURE: Integrated in Hadoop-Mapreduce-trunk #2466 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2466/]) HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: rev 5b43db47a313decccdcca8f45c5708aab46396df) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Delayed rolling upgrade finalization can cause heartbeat expiration and write > failures > -------------------------------------------------------------------------------------- > > Key: HDFS-8676 > URL: https://issues.apache.org/jira/browse/HDFS-8676 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Walter Su > Priority: Critical > Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch > > > In big busy clusters where the deletion rate is also high, a lot of blocks > can pile up in the datanode trash directories until an upgrade is finalized. > When it is finally finalized, the deletion of trash is done in the service > actor thread's context synchronously. This blocks the heartbeat and can > cause heartbeat expiration. > We have seen a namenode losing hundreds of nodes after a delayed upgrade > finalization. The deletion of trash directories should be made asynchronous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)