[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873185#comment-16873185 ]
He Xiaoqiao commented on HDFS-12703: ------------------------------------ re-submit patch same as [^HDFS-12703.001.patch] via [~xuel1] and trigger Jenkins again. > Exceptions are fatal to decommissioning monitor > ----------------------------------------------- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Daryn Sharp > Assignee: Xue Liu > Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org