[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501431 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:05 Start Date: 16/Oct/20 06:05 Worklog Time Spent: 10m Work Description: fengnanli commented on pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#issuecomment-709817743 Will put another patch with UT soon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501431) Time Spent: 1h (was: 50m) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501429&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501429 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:04 Start Date: 16/Oct/20 06:04 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#discussion_r506070581 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3512,7 +3512,11 @@ private Block addStoredBlock(final BlockInfo block, int numUsableReplicas = num.liveReplicas() + num.decommissioning() + num.liveEnteringMaintenanceReplicas(); -if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && + +// if block is still under construction, then done for now +if (!storedBlock.isCompleteOrCommitted()) { Review comment: I felt quite confused with the original structure since the early return was put after the statements it is trying to avoid.. I can make it a single return no big deal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501429) Time Spent: 40m (was: 0.5h) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501430&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501430 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:04 Start Date: 16/Oct/20 06:04 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#discussion_r506070741 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3559,9 +3558,26 @@ private Block addStoredBlock(final BlockInfo block, if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) { invalidateCorruptReplicas(storedBlock, reportedBlock, num); } +if (shouldInvalidateDecommissionedRedundancy(num, fileRedundancy)) { + for (DatanodeStorageInfo storage : blocksMap.getStorages(block)) { +final DatanodeDescriptor datanode = storage.getDatanodeDescriptor(); +if (datanode.isDecommissioned() +|| datanode.isDecommissionInProgress()) { + addToInvalidates(storedBlock, datanode); +} + } +} return storedBlock; } + // If there are enough live replicas, start invalidating + // decommissioned + decommissioning replicas + private boolean shouldInvalidateDecommissionedRedundancy(NumberReplicas num, Review comment: Good idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501430) Time Spent: 50m (was: 40m) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501363 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 02:13 Start Date: 16/Oct/20 02:13 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#issuecomment-709687903 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 28m 59s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 29m 41s | | trunk passed | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | compile | 1m 9s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 18s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 8s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 26s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +0 :ok: | spotbugs | 3m 2s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 0s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 1m 5s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | javac | 1m 5s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 42s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 11s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 14m 11s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 47s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 21s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | findbugs | 2m 58s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 95m 11s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2388/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 207m 14s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.TestDecommissionWithStriped | | | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | | | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.TestFileChecksumCompositeCrc | | | hadoop.hdfs.TestStripedFileAppend | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2388/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2388 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2bbb452b3bd4 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Perso
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501332 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 15/Oct/20 23:51 Start Date: 15/Oct/20 23:51 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#discussion_r505926806 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3512,7 +3512,11 @@ private Block addStoredBlock(final BlockInfo block, int numUsableReplicas = num.liveReplicas() + num.decommissioning() + num.liveEnteringMaintenanceReplicas(); -if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && + +// if block is still under construction, then done for now +if (!storedBlock.isCompleteOrCommitted()) { Review comment: Why do we move this block here? BTW we can leave it as a single if with a return. ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3559,9 +3558,26 @@ private Block addStoredBlock(final BlockInfo block, if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) { invalidateCorruptReplicas(storedBlock, reportedBlock, num); } +if (shouldInvalidateDecommissionedRedundancy(num, fileRedundancy)) { + for (DatanodeStorageInfo storage : blocksMap.getStorages(block)) { +final DatanodeDescriptor datanode = storage.getDatanodeDescriptor(); +if (datanode.isDecommissioned() +|| datanode.isDecommissionInProgress()) { + addToInvalidates(storedBlock, datanode); +} + } +} return storedBlock; } + // If there are enough live replicas, start invalidating + // decommissioned + decommissioning replicas + private boolean shouldInvalidateDecommissionedRedundancy(NumberReplicas num, Review comment: It makes sense. Maybe we should describe some of the JIRA description in this method to explain what we are doing in the high level. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501332) Time Spent: 20m (was: 10m) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501325 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 15/Oct/20 22:45 Start Date: 15/Oct/20 22:45 Worklog Time Spent: 10m Work Description: fengnanli opened a new pull request #2388: URL: https://github.com/apache/hadoop/pull/2388 … Datanodes ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.) For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501325) Remaining Estimate: 0h Time Spent: 10m > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org