[ https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812335#comment-17812335 ]
ASF GitHub Bot commented on HDFS-17358: --------------------------------------- hadoop-yetus commented on PR #6509: URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916966515 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 10s | | trunk passed | | +1 :green_heart: | compile | 1m 28s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 18s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 33s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 38s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 47s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 55s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 21s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 21s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/artifact/out/blanks-eol.txt) | The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 16s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 18s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 223m 16s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 372m 36s | | | | Reason | Tests | |-------:|:------| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6509 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c0a7f19e0fa7 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6c1081a5fb506f9a96078569ee11a6a242bf7552 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/testReport/ | | Max. process+thread count | 3981 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > EC: infinite lease recovery caused by the length of RWR equals to zero. > ----------------------------------------------------------------------- > > Key: HDFS-17358 > URL: https://issues.apache.org/jira/browse/HDFS-17358 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec > Reporter: farmmamba > Assignee: farmmamba > Priority: Major > Labels: pull-request-available > > Recently, there is a strange case happened on our ec production cluster. > The phenomenon is as below described: NameNode does infinite recovery lease > of some ec files(~80K+) and those files could never be closed. > > After digging into logs and releated code, we found the root cause is below > codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`: > {code:java} > // we met info.getNumBytes==0 here! > if (info != null && > info.getGenerationStamp() >= block.getGenerationStamp() && > info.getNumBytes() > 0) { > final BlockRecord existing = syncBlocks.get(blockId); > if (existing == null || > info.getNumBytes() > existing.rInfo.getNumBytes()) { > // if we have >1 replicas for the same internal block, we > // simply choose the one with larger length. > // TODO: better usage of redundant replicas > syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info)); > } > } > // throw exception here! > checkLocations(syncBlocks.size()); > {code} > The related logs are as below: > {code:java} > java.io.IOException: > BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 > has no enough internal blocks, unable to start recovery. Locations=[...] > {code} > {code:java} > 2024-01-23 12:48:16,171 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, > replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR > getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = > /data25/hadoop/hdfs/datanode getBlockURI() = > file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686 > recoveryId=27529675 original=ReplicaWaitingToBeRecovered, > blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 > getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode > getBlockURI() = > file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686 > {code} > because the length of RWR is zero, the length of the returned object in > below codes is zero. We can't put it into syncBlocks. > So throw exception in checkLocations method. > {code:java} > ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN, > new RecoveringBlock(internalBlk, null, recoveryId)); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org