[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811683#comment-17811683
 ] 

ASF GitHub Bot commented on HDFS-17358:
---------------------------------------

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1913701342

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   7m 55s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 195m 42s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 293m 39s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecovery |
   |   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
   |   | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d1e5494161f3 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 30e3bdaa8262a0ee2ed2c00773d3420a0b096ac6 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/testReport/ |
   | Max. process+thread count | 4210 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-17358
>                 URL: https://issues.apache.org/jira/browse/HDFS-17358
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>            Reporter: farmmamba
>            Assignee: farmmamba
>            Priority: Major
>              Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>           if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>           // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to