[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7374: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Thanks for the final sign-off ATM, I've committed this to trunk and branch-2. Thanks Zhe for the patch and Ming for the helpful advice throughout. > Allow decommissioning of dead DataNodes > --- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Fix For: 2.7.0 > > Attachments: HDFS-7374-001.patch, HDFS-7374-002.patch, > HDFS-7374.003.patch > > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7374: -- Attachment: HDFS-7374.003.patch I went ahead and made the small change required in DatanodeManager#registerDatanode to make this work. The registration steps were happening in a different order for the "known DN" vs "unknown DN" cases, so I just reordered things to make sure an unknown DN is marked as alive before we check decom state. > Allow decommissioning of dead DataNodes > --- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7374-001.patch, HDFS-7374-002.patch, > HDFS-7374.003.patch > > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7374: Attachment: HDFS-7374-002.patch Thanks [~andrew.wang] for the review! The previous patch missed the {DEAD, DECOMM_IN_PROGRESS} -> {DEAD, DECOMMED} case. And I agree with moving the logic to {{startDecommission}}. The revised patch has also addressed the comments on unit testing code. > Allow decommissioning of dead DataNodes > --- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7374-001.patch, HDFS-7374-002.patch > > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7374: Attachment: HDFS-7374-001.patch Thanks [~mingma] again for the comment. This patch implements option #2. It also moved an utility function to {{DFSTestUtil}} so it's accessible in the new unit test. > Allow decommissioning of dead DataNodes > --- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7374-001.patch > > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7374: Status: Patch Available (was: Open) > Allow decommissioning of dead DataNodes > --- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)