[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5579: Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks for the contribution [~zhaoyunjiong]! Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.4.0 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579-branch-1.2.patch HDFS-5579.patch Good point. Thanks Jing. Update patches to fix this problem. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579-branch-1.2.patch, HDFS-5579.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579-branch-1.2.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579.patch Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5579: Status: Patch Available (was: Open) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0, 1.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579.patch HDFS-5579-branch-1.2.patch Update patch, added test case for trunk. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579-branch-1.2.patch, HDFS-5579.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579-branch-1.2.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579-branch-1.2.patch HDFS-5579.patch Thanks Vinay. Update patch as your comments. Except: getLastBlock do throws IOException, I deleted it in this patch. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579-branch-1.2.patch, HDFS-5579.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579-branch-1.2.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579.patch HDFS-5579-branch-1.2.patch This patch let NameNode can replicate blocks belongs to under construction files but not the last block. And if the decommissioning DataNodes only have some blocks which are the last blocks of under construction files and have more than 1 live replicates left behind, then NameNode could set it to DECOMMISSIONED. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1#6144)