[jira] [Commented] (HDFS-17690) Avoid redundant EC reconstruction tasks after pending reconstruction timeouts

ASF GitHub Bot (Jira) Wed, 18 Dec 2024 01:38:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906668#comment-17906668
 ]


ASF GitHub Bot commented on HDFS-17690:
---------------------------------------

hadoop-yetus commented on PR #7231:
URL: https://github.com/apache/hadoop/pull/7231#issuecomment-2550837174

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   6m 40s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  23m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 30s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 131 unchanged 
- 0 fixed = 135 total (was 131)  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  spotbugs  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 38s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 192m 39s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 274m 32s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.hdfs.TestDecommission |
   |   | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
   |   | hadoop.hdfs.TestDecommissionWithBackoffMonitor |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.47 ServerAPI=1.47 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/7231 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 9f99b4ac917c 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 
20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 867767877811a6649b79ebca798af815fca59111 |
   | Default Java | Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/testReport/ |
   | Max. process+thread count | 4866 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7231/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Avoid redundant EC reconstruction tasks after pending reconstruction timeouts
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-17690
>                 URL: https://issues.apache.org/jira/browse/HDFS-17690
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ec
>    Affects Versions: 3.4.1
>            Reporter: Junegunn Choi
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2024-12-18-13-50-11-978.png, 
> image-2024-12-18-13-51-33-552.png
>
>
> h2. Problem
> We are running HDFS clusters with RS-6-3-1024k policy enabled.
> When datanodes go down, HDFS reconstructs the under-replicated EC blocks as 
> expected and the number of low redundancy blocks decreases. However, after 
> some period of time, we are observing a lot of wasted effort in 
> reconstructing EC blocks that are already fully reconstructed for a very long 
> time.
> !image-2024-12-18-13-50-11-978.png|width=1000!
>  * We shutted down two datanodes from an HDFS cluster with 21 datanodes.
>  * Each datanode holds around 2TBs of data on SSD devices.
>  * It took 10 hours for the low redundancy metric to reach 0.
>  * Lots of (17K+) failed reconstruction tasks, wasting disk and network 
> resources during the period.
>  ** (There were no other activities on the cluster)
> This is the log messages for {{{}blk_-9223372036851532480_352520{}}}, and we 
> can see repeated {{"Failed to reconstruct"}} messages on different datanodes.
> {noformat}
> @timestamp,server,loglevel,role,data
> Dec 18, 2024 @ 00:56:07.000,s19,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 01:03:41.000,s04,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 04:56:16.000,s20,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 05:18:37.000,s20,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 07:18:31.000,s20,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 07:38:34.000,s20,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 08:18:28.000,s19,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 08:18:40.000,s19,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 08:19:07.000,s19,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 08:23:08.000,s18,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> Dec 18, 2024 @ 08:23:38.000,s18,WARN,datanode,Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> {noformat}
> The repeated failures were caused by {{{}ReplicaAlreadyExistsException{}}}, 
> which means we were performing redundant tasks.
> {noformat}
> 24/12/18 08:23:59 INFO datanode.DataNode: DatanodeCommand action: 
> DNA_ERASURE_CODING_RECOVERY
> 24/12/18 08:23:59 INFO datanode.DataNode: Receiving 
> BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 src: 
> /s18:35816 dest: /s18:1004
> 24/12/18 08:23:59 INFO datanode.DataNode: opWriteBlock 
> BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 received 
> exception 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 already exists 
> in state FINALIZED and thus cannot be created.
> 24/12/18 08:23:59 INFO datanode.DataNode: s18:1004:DataXceiver error 
> processing WRITE_BLOCK operation  src: /s18:35816 dst: /s18:1004; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-1483963022-k1-1732859582319:blk_-9223372036851532476_352520 already exists 
> in state FINALIZED and thus cannot be created.
> 24/12/18 08:23:59 WARN datanode.DataNode: Broken pipe
> 24/12/18 08:23:59 WARN datanode.DataNode: Failed to reconstruct striped 
> block: BP-1483963022-k1-1732859582319:blk_-9223372036851532480_352520
> java.io.IOException: Transfer failed for all targets.
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:118)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:63)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> {{hdfs fsck -blockId blk_-9223372036851532480}} also shows that this block 
> doesn't need reconstruction.
> {noformat}
> No. of Expected Replica: 9
> No. of live Replica: 9
> {noformat}
> h2. Workaround
> Because this happens only after several {{PendingReconstructionMonitor timed 
> out}} messages, I tried increasing 
> {{dfs.namenode.reconstruction.pending.timeout-sec}} so no timeout occurs, and 
> the problem disappeared.
> h2. Cause
> When an EC reconstruction task for a block is scheduled, 
> {{BlockECReconstructionInfo}} for the block is added to the 
> {{erasurecodeBlocks}} queue of the {{DatanodeDescriptor}} for the target 
> node. However, it is not removed from the queue when the reconstruction task 
> is timed out, recreated, and rescheduled. And we end up with multiple 
> redundant {{BlockECReconstructionInfo}} objects on the queues of several 
> datanodes.
> This explains a lot of wasted effort after multiple 
> PendingReconstructionMonitor timeouts.
> You're more likely to experience this problem on a relatively small cluster 
> with a small {{dfs.namenode.replication.max-streams}} value, where block 
> reconstruction for failed datanodes would take a considerable amount of time 
> and trigger many pending reconstruction timeouts. This was exactly the case 
> for us; our cluster has only 21 datanodes and 
> {{dfs.namenode.replication.max-streams}} is set to 1 because we're running 
> HBase on the same cluster and don't want eager reconstruction to affect 
> HBase's performance.
> h2. Suggested fix
> Avoid processing redundant {{BlockECReconstructionInfo}} from the queue. 
> Check again if the task is really needed just before we dispatch the tasks to 
> the datanodes.
> h2. Result
> !image-2024-12-18-13-51-33-552.png|width=1000!
>  * Took down 2 datanodes
>  * Only took 3.5 hours to complete the reconstruction
>  * 10K+ {{"doesn't need reconstruction"}} messages in the namenode log, 
> showing that we have successfully avoided dispatching redundant tasks.
>  * No 
> {{org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException}} 
> errors
> h2. Alternative approach considered
> We can instead remove the {{BlockECReconstructionInfo}} from the queue when 
> the task is timed out, but this would require more bookkeeping and is more 
> error-prone.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17690) Avoid redundant EC reconstruction tasks after pending reconstruction timeouts

Reply via email to