[ https://issues.apache.org/jira/browse/HDFS-17690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906668#comment-17906668 ]
ASF GitHub Bot commented on HDFS-17690: --------------------------------------- hadoop-yetus commented on PR #7231: URL: https://github.com/apache/hadoop/pull/7231#issuecomment-2550837174 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 6m 40s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 23m 14s | | trunk passed | | +1 :green_heart: | compile | 0m 43s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | checkstyle | 0m 39s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 44s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 6s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 40s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 54s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 30s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 131 unchanged - 0 fixed = 135 total (was 131) | | +1 :green_heart: | mvnsite | 0m 36s | | the patch passed | | +1 :green_heart: | javadoc | 0m 30s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 0s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 19m 38s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 192m 39s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 30s | | The patch does not generate ASF License warnings. | | | | 274m 32s | | | | Reason | Tests | |-------:|:------| | Failed junit tests | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.blockmanagement.TestDatanodeManager | | | hadoop.hdfs.TestDecommissionWithBackoffMonitor | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7231 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 9f99b4ac917c 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 867767877811a6649b79ebca798af815fca59111 | | Default Java | Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/testReport/ | | Max. process+thread count | 4866 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Avoid redundant EC reconstruction tasks after pending reconstruction timeouts > ----------------------------------------------------------------------------- > > Key: HDFS-17690 > URL: https://issues.apache.org/jira/browse/HDFS-17690 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec > Affects Versions: 3.4.1 > Reporter: Junegunn Choi > Priority: Major > Labels: pull-request-available > Attachments: image-2024-12-18-13-50-11-978.png, > image-2024-12-18-13-51-33-552.png > > > h2. Problem > We are running HDFS clusters with RS-6-3-1024k policy enabled. > When datanodes go down, HDFS reconstructs the under-replicated EC blocks as > expected and the number of low redundancy blocks decreases. However, after > some period of time, we are observing a lot of wasted effort in > reconstructing EC blocks that are already fully reconstructed for a very long > time. > !image-2024-12-18-13-50-11-978.png|width=1000! > * We shutted down two datanodes from an HDFS cluster with 21 datanodes. > * Each datanode holds around 2TBs of data on SSD devices. > * It took 10 hours for the low redundancy metric to reach 0. > * Lots of (17K+) failed reconstruction tasks, wasting disk and network > resources during the period. > ** (There were no other activities on the cluster) > This is the log messages for {{{}blk_-9223372036851532480_352520{}}}, and we > can see repeated {{"Failed to reconstruct"}} messages on different datanodes. > {noformat} > @timestamp,server,loglevel,role,data > Dec 18, 2024 @ 00:56:07.000,s19,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 01:03:41.000,s04,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 04:56:16.000,s20,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 05:18:37.000,s20,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 07:18:31.000,s20,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 07:38:34.000,s20,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 08:18:28.000,s19,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 08:18:40.000,s19,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 08:19:07.000,s19,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 08:23:08.000,s18,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > Dec 18, 2024 @ 08:23:38.000,s18,WARN,datanode,Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > {noformat} > The repeated failures were caused by {{{}ReplicaAlreadyExistsException{}}}, > which means we were performing redundant tasks. > {noformat} > 24/12/18 08:23:59 INFO datanode.DataNode: DatanodeCommand action: > DNA_ERASURE_CODING_RECOVERY > 24/12/18 08:23:59 INFO datanode.DataNode: Receiving > BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 src: > /s18:35816 dest: /s18:1004 > 24/12/18 08:23:59 INFO datanode.DataNode: opWriteBlock > BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 received > exception > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 already exists > in state FINALIZED and thus cannot be created. > 24/12/18 08:23:59 INFO datanode.DataNode: s18:1004:DataXceiver error > processing WRITE_BLOCK operation src: /s18:35816 dst: /s18:1004; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 already exists > in state FINALIZED and thus cannot be created. > 24/12/18 08:23:59 WARN datanode.DataNode: Broken pipe > 24/12/18 08:23:59 WARN datanode.DataNode: Failed to reconstruct striped > block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520 > java.io.IOException: Transfer failed for all targets. > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:118) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:63) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > {{hdfs fsck -blockId blk_-9223372036851532480}} also shows that this block > doesn't need reconstruction. > {noformat} > No. of Expected Replica: 9 > No. of live Replica: 9 > {noformat} > h2. Workaround > Because this happens only after several {{PendingReconstructionMonitor timed > out}} messages, I tried increasing > {{dfs.namenode.reconstruction.pending.timeout-sec}} so no timeout occurs, and > the problem disappeared. > h2. Cause > When an EC reconstruction task for a block is scheduled, > {{BlockECReconstructionInfo}} for the block is added to the > {{erasurecodeBlocks}} queue of the {{DatanodeDescriptor}} for the target > node. However, it is not removed from the queue when the reconstruction task > is timed out, recreated, and rescheduled. And we end up with multiple > redundant {{BlockECReconstructionInfo}} objects on the queues of several > datanodes. > This explains a lot of wasted effort after multiple > PendingReconstructionMonitor timeouts. > You're more likely to experience this problem on a relatively small cluster > with a small {{dfs.namenode.replication.max-streams}} value, where block > reconstruction for failed datanodes would take a considerable amount of time > and trigger many pending reconstruction timeouts. This was exactly the case > for us; our cluster has only 21 datanodes and > {{dfs.namenode.replication.max-streams}} is set to 1 because we're running > HBase on the same cluster and don't want eager reconstruction to affect > HBase's performance. > h2. Suggested fix > Avoid processing redundant {{BlockECReconstructionInfo}} from the queue. > Check again if the task is really needed just before we dispatch the tasks to > the datanodes. > h2. Result > !image-2024-12-18-13-51-33-552.png|width=1000! > * Took down 2 datanodes > * Only took 3.5 hours to complete the reconstruction > * 10K+ {{"doesn't need reconstruction"}} messages in the namenode log, > showing that we have successfully avoided dispatching redundant tasks. > * No > {{org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException}} > errors > h2. Alternative approach considered > We can instead remove the {{BlockECReconstructionInfo}} from the queue when > the task is timed out, but this would require more bookkeeping and is more > error-prone. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org