Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-22 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2068640885

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   6m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  4s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   2m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m  2s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  21m  7s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 38s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/11/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 113 unchanged - 0 fixed = 
114 total (was 113)  |
   | +1 :green_heart: |  mvnsite  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 49s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 214m 41s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 26s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 329m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux f0536dd0a688 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d698a9445b4a06fd8978ee4c5005964270d236d9 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-21 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2068331088

   ```
   
./hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java:171:
  public DfsClientConf(Configuration conf) {:3: Method length is 156 lines (max 
allowed is 150). [MethodLength]
   ```
   Should we suppress this checkstyle warning? Or are there any better 
suggestions?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-19 Thread via GitHub


haiyang1987 commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2067551100

   please fix checkstyle, thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-19 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2067076767

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 14s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   2m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  21m 36s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   2m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   2m 43s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 38s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 2 new + 113 unchanged - 0 fixed = 
115 total (was 113)  |
   | +1 :green_heart: |  mvnsite  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 51s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 212m  5s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 325m 11s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 91e3e1ef2a9e 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-19 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2066542689

   > The UT `hadoop.hdfs.TestDFSStripedInputStreamWithTimeout ` run failed.
   
   add Datanode$closeDataXceiverServer method to close connnect for testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-19 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2066527804

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  0s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   2m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  21m 30s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   2m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   2m 49s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 37s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 12s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 22s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 51s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 211m 45s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 321m  9s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 911c52bf0959 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


haiyang1987 commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2062844297

   The UT `hadoop.hdfs.TestDFSStripedInputStreamWithTimeout ` run failed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2061596554

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 56s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   2m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 19s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  21m 32s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   2m 44s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 34s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 54s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 209m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 318m 25s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 574d2feade6c 5.15.0-94-generic 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568745996


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  offsetInBlock, targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);

Review Comment:
   Can use the 
   ```
   DFSClient.LOG.warn("Reconnect to {} for block {}", currentNode.getInfoAddr(),
   currentBlock.getBlock());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568756254


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {

Review Comment:
   here update  `while (true)` and can remove  line[286~288], how about it ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568748476


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "

Review Comment:
   here also



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568748150


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  offsetInBlock, targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);
+continue;
+  }
 }
-length += ret;
+DFSClient.LOG.warn("Exception while reading from "

Review Comment:
   Here also can use to `warn("{}", arg)` format?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-17 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2060941447

   > @Neilxzn Hi, this patch is very useful, would you mind further fixing this 
PR?
   
   Sorry for my late reply.  I have updated the patch based on the suggestions 
above. Please review it again. @haiyang1987 @zhangshuyan0 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-15 Thread via GitHub


zhangshuyan0 commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2056526550

   @Neilxzn Hi, this patch is very useful, would you mind further fixing this 
PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-08 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1556791082


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,

Review Comment:
   yeah, Agree with @zhangshuyan0 comment.
   `readToBuffer(reader, datanode, strategy, currentBlock, chunkIndex, ret) ` 
   here we shoud use `alignedStripe.getOffsetInBlock()+ret` instead of `ret`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-08 Thread via GitHub


zhangshuyan0 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1555821906


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -284,7 +307,8 @@ private Callable readCells(final 
BlockReader reader,
 
   int ret = 0;
   for (ByteBufferStrategy strategy : strategies) {
-int bytesReead = readToBuffer(reader, datanode, strategy, 
currentBlock);
+int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock,

Review Comment:
   Could you please correct this variable name by the way? `bytesReead` -> 
`bytesRead`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-04-08 Thread via GitHub


zhangshuyan0 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1555819771


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,

Review Comment:
   Great catch! Agree with @haiyang1987 's idea. But the solution seems flawed. 
Should we use `alignedStripe.getOffsetInBlock()+ret` instead of `ret` here? 
Looking forward to your reply! @haiyang1987 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-20 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531798611


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {

Review Comment:
   ```
   private int readToBuffer(BlockReader blockReader,
 DatanodeInfo currentNode, ByteBufferStrategy strategy,
 LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-20 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531799282


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,

Review Comment:
   ```
if (dfsStripedInputStream.createBlockReader(currentBlock,
 offsetInBlock, targetBlocks,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-20 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531806620


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,

Review Comment:
   Hi @Neilxzn @Hexiaoqiao @ayushtkn @zhangshuyan0 @ZanderXu what dou you think?
   Please also help to look into this issue when you have free time , thanks~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-20 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531803838


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,

Review Comment:
   If use pread to read data, if the currently set buffer size is a block size,
   For a block in a dn, the data of multiple cell units may be read, so the 
size of the ByteBufferStrategy array in the StripingChunk corresponding to the 
AlignedStripe is calculated to be multiple (there are multiple List 
slices in ChunkByteBuffer),
   
   https://github.com/apache/hadoop/assets/3760130/40f7a944-ea57-4891-9719-86a1b009244d;>
   
   So when processing retry createBlockReader in readToBuffer, we may need to 
consider the current actual offsetInBlock to avoid reading duplicate data from 
datanode.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-20 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531794650


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -284,7 +307,8 @@ private Callable readCells(final 
BlockReader reader,
 
   int ret = 0;
   for (ByteBufferStrategy strategy : strategies) {
-int bytesReead = readToBuffer(reader, datanode, strategy, 
currentBlock);
+int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock,
+chunkIndex);

Review Comment:
   For `readToBuffer` maybe need to consider the current actual offsetInBlock.
   
   `readToBuffer(reader, datanode, strategy, currentBlock, chunkIndex, ret);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-11 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987902095

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 40s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   3m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   1m 39s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  22m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  23m  3s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   2m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   2m 56s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 44s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 54s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 235m 57s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 351m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestEncryptionZonesWithKMS |
   |   | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   |   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
   |   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.tools.TestDFSAdmin |
   |   | 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-11 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987825036

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 55s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   2m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   1m 30s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  22m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  22m 18s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   3m  5s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   3m  1s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 40s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 49s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 217m  9s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 330m 42s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
   |   | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   |   | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
   |   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
   |   | 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-10 Thread via GitHub


haiyang1987 commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987535597

   > > Hi @Neilxzn Any progress here? Thanks.
   > > this PR is still necessary, there are some similar problems in our 
environment~
   > 
   > @haiyang1987 Our online environment (70 PB EC Data cluster, spark + hive 
olap) has already applied this patch. So far, everything is running normally.
   
   Noted, thanks for you work ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-10 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987533616

   Rebase it to branch trunk. And The new test 
`org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout`  passed on my 
lolcal env.
   https://github.com/apache/hadoop/assets/10757009/6bedc732-cbe7-403b-8c6a-b5e78a33527f;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-10 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987532047

   > Hi @Neilxzn Any progress here? Thanks.
   > 
   > this PR is still necessary, there are some similar problems in our 
environment~
   
   @haiyang1987  Our online environment (70 PB EC Data cluster, spark + hive 
olap) has already applied this patch. So far, everything is running normally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-10 Thread via GitHub


Neilxzn opened a new pull request, #5829:
URL: https://github.com/apache/hadoop/pull/5829

   
   
   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-15413
   
   Offer a available patch to fix HDFS-15413.  This patch add 
dfs.client.read.striped.datanode.max.attempts config to allow users to adjust 
the number of dn retries to solve the problem of Datanode timeout when reading 
EC files. 
   
   ### How was this patch tested?
   no add test.  just test in our cluster
   
   ### For code changes:
   add dfs.client.read.striped.datanode.max.attempts config
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-10 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987499017

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  patch  |   0m 19s |  |  
https://github.com/apache/hadoop/pull/5829 does not apply to trunk. Rebase 
required? Wrong Branch? See 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.  
|
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/5/console |
   | versions | git=2.34.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-10 Thread via GitHub


Neilxzn closed pull request #5829:  HDFS-15413. add 
dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout
URL: https://github.com/apache/hadoop/pull/5829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-08 Thread via GitHub


haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1518480694


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,62 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);
+continue;
+  }
+  DFSClient.LOG.warn("Exception while reading from "

Review Comment:
   Line[278-281] will  move` if (curAttempts < readDNMaxAttempts) {` outside



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-08 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1985490862

   这是自动回复邮件。来件已接收,谢谢。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2024-03-08 Thread via GitHub


haiyang1987 commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1985490125

   Hi @Neilxzn Any progress here? Thanks.
   
   this PR is still necessary, there are some similar problems in our 
environment~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-11 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1849677715

   > @Neilxzn I tried & it fails locally
   > 
   > ```
   > [INFO] ---
   > [INFO]  T E S T S
   > [INFO] ---
   > [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout
   > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
17.64 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout
   > [ERROR] 
testPreadTimeout(org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout)  
Time elapsed: 17.509 s  <<< FAILURE!
   > java.lang.AssertionError: It Should fail to read striped time out with 1 
attempt . 
   > at org.junit.Assert.fail(Assert.java:89)
   > at 
org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout.testPreadTimeout(TestDFSStripedInputStreamWithTimeout.java:145)
   > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > at java.lang.reflect.Method.invoke(Method.java:498)
   > at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
   > at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   > at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
   > at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   > at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   > at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   > at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
   > at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
   > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   > at java.lang.Thread.run(Thread.java:750)
   > 
   > [INFO] 
   > [INFO] Results:
   > [INFO] 
   > [ERROR] Failures: 
   > [ERROR]   TestDFSStripedInputStreamWithTimeout.testPreadTimeout:145 It 
Should fail to read striped time out with 1 attempt . 
   > [INFO] 
   > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
   > [INFO] 
   > [ERROR] There are test failures.
   > ```
   > 
   > To reproduce: in the hadoop root directory there is file named 
`start-build-env.sh`, run that `bash start-build-env.sh`, it will give you a 
docker env, run the test inside that & it will fail
   
   Thank you. I will check it again soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-07 Thread via GitHub


ayushtkn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1845117963

   @Neilxzn I tried & it fails locally
   ```
   [INFO] ---
   [INFO]  T E S T S
   [INFO] ---
   [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout
   [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
17.64 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout
   [ERROR] 
testPreadTimeout(org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout)  
Time elapsed: 17.509 s  <<< FAILURE!
   java.lang.AssertionError: It Should fail to read striped time out with 1 
attempt . 
   at org.junit.Assert.fail(Assert.java:89)
   at 
org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout.testPreadTimeout(TestDFSStripedInputStreamWithTimeout.java:145)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
   at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
   at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
   at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.lang.Thread.run(Thread.java:750)
   
   [INFO] 
   [INFO] Results:
   [INFO] 
   [ERROR] Failures: 
   [ERROR]   TestDFSStripedInputStreamWithTimeout.testPreadTimeout:145 It 
Should fail to read striped time out with 1 attempt . 
   [INFO] 
   [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
   [INFO] 
   [ERROR] There are test failures.
   
   ```
   
   To reproduce:
   in the hadoop root directory there is file named ``start-build-env.sh``, run 
that ``bash start-build-env.sh``, it will give you a docker env, run the test 
inside that & it will fail


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-06 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1844334814

   I can pass the unit test hadoop.hdfs.TestDFSStripedInputStreamWithTimeout in 
my local development environment, but it fails on GitHub Jenkins.
   
![image](https://github.com/apache/hadoop/assets/10757009/a511b4e1-8413-44bb-9136-5e7cc1f3ff17)
   Check if the test log of the development environment is consistent with the 
assumption. When the client reads the file for the first time and stops for 10 
seconds, the connection between the client and the datanode server will be 
automatically disconnected, resulting in a failed subsequent read by the 
client.  @ayushtkn Any other suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-06 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1842713669

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 24s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   2m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  3s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m  9s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   2m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   2m 42s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 35s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 49s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 189m 58s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 293m 19s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 807603bf2dcf 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bb46dbd471d96878492fe660b2af03a8384f8123 |
   | Default Java | Private 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-05 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1840975664

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  2s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  35m  6s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m  4s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 56s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m 42s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m 17s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 2 new + 45 unchanged - 0 fixed = 
47 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   2m  6s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 59s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 22s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 251m 52s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 439m 20s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 15e76d99d238 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / db52c34f95a1caeb7e07157c6289793acf91c514 |
   

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-04 Thread via GitHub


Neilxzn commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1415029913


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java:
##
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs;
+
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.protocol.Block;
+import org.apache.hadoop.hdfs.protocol.ErasureCodingPolicy;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.protocol.LocatedBlocks;
+import org.apache.hadoop.hdfs.protocol.LocatedStripedBlock;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import org.apache.hadoop.io.erasurecode.CodecUtil;
+import org.apache.hadoop.io.erasurecode.ErasureCodeNative;
+import 
org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+public class TestDFSStripedInputStreamWithTimeout {
+
+  public static final Logger LOG =
+  LoggerFactory.getLogger(TestDFSStripedInputStreamWithTimeout.class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf = new Configuration();
+  private DistributedFileSystem fs;
+  private final Path dirPath = new Path("/striped");
+  private Path filePath = new Path(dirPath, "file");
+  private ErasureCodingPolicy ecPolicy;
+  private short dataBlocks;
+  private short parityBlocks;
+  private int cellSize;
+  private final int stripesPerBlock = 2;
+  private int blockSize;
+  private int blockGroupSize;
+
+  @Rule
+  public Timeout globalTimeout = new Timeout(30);
+
+  public ErasureCodingPolicy getEcPolicy() {
+return StripedFileTestUtil.getDefaultECPolicy();
+  }
+
+  @Before
+  public void setup() throws IOException {
+/*
+ * Initialize erasure coding policy.
+ */
+ecPolicy = getEcPolicy();
+dataBlocks = (short) ecPolicy.getNumDataUnits();
+parityBlocks = (short) ecPolicy.getNumParityUnits();
+cellSize = ecPolicy.getCellSize();
+blockSize = stripesPerBlock * cellSize;
+blockGroupSize = dataBlocks * blockSize;
+System.out.println("EC policy = " + ecPolicy);
+
+conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, blockSize);
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REPLICATION_MAX_STREAMS_KEY, 0);
+
+conf.setInt(DFSConfigKeys.DFS_DATANODE_SOCKET_WRITE_TIMEOUT_KEY, 1000);
+// SET CONFIG FOR HDFS CLIENT
+conf.setInt(DFSConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, 1000);
+conf.setInt(HdfsClientConfigKeys.StripedRead.DATANODE_MAX_ATTEMPTS, 3);
+
+if (ErasureCodeNative.isNativeCodeLoaded()) {
+  conf.set(
+  CodecUtil.IO_ERASURECODE_CODEC_RS_RAWCODERS_KEY,
+  NativeRSRawErasureCoderFactory.CODER_NAME);
+}
+conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR,
+GenericTestUtils.getRandomizedTempPath());
+SimulatedFSDataset.setFactory(conf);
+startUp();
+  }
+
+  private void startUp() throws IOException {
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(
+dataBlocks + parityBlocks).build();
+cluster.waitActive();
+for (DataNode dn : cluster.getDataNodes()) {
+  DataNodeTestUtils.setHeartbeatsDisabledForTests(dn, true);
+}
+fs = cluster.getFileSystem();
+fs.enableErasureCodingPolicy(getEcPolicy().getName());
+fs.mkdirs(dirPath);
+fs.getClient()
+.setErasureCodingPolicy(dirPath.toString(), ecPolicy.getName());
+  }
+
+  @After
+  public void tearDown() {
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  @Test
+  public void testPreadTimeout() throws Exception {
+final int numBlocks 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-04 Thread via GitHub


Neilxzn commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1415026140


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java:
##
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs;
+
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.protocol.Block;
+import org.apache.hadoop.hdfs.protocol.ErasureCodingPolicy;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.protocol.LocatedBlocks;
+import org.apache.hadoop.hdfs.protocol.LocatedStripedBlock;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import org.apache.hadoop.io.erasurecode.CodecUtil;
+import org.apache.hadoop.io.erasurecode.ErasureCodeNative;
+import 
org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+public class TestDFSStripedInputStreamWithTimeout {
+
+  public static final Logger LOG =
+  LoggerFactory.getLogger(TestDFSStripedInputStreamWithTimeout.class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf = new Configuration();
+  private DistributedFileSystem fs;
+  private final Path dirPath = new Path("/striped");
+  private Path filePath = new Path(dirPath, "file");
+  private ErasureCodingPolicy ecPolicy;
+  private short dataBlocks;
+  private short parityBlocks;
+  private int cellSize;
+  private final int stripesPerBlock = 2;
+  private int blockSize;
+  private int blockGroupSize;
+
+  @Rule
+  public Timeout globalTimeout = new Timeout(30);
+
+  public ErasureCodingPolicy getEcPolicy() {
+return StripedFileTestUtil.getDefaultECPolicy();
+  }
+
+  @Before
+  public void setup() throws IOException {
+/*
+ * Initialize erasure coding policy.
+ */
+ecPolicy = getEcPolicy();
+dataBlocks = (short) ecPolicy.getNumDataUnits();
+parityBlocks = (short) ecPolicy.getNumParityUnits();
+cellSize = ecPolicy.getCellSize();
+blockSize = stripesPerBlock * cellSize;
+blockGroupSize = dataBlocks * blockSize;
+System.out.println("EC policy = " + ecPolicy);
+
+conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, blockSize);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-04 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1838407963

   > Hi @Neilxzn , any chance you have time to finish this up?
   
   Sorry for the late reply. I have been busy with other things recently. I 
will try to submit a new unit test tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-12-03 Thread via GitHub


bbeaudreault commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1837536440

   Hi @Neilxzn , any chance you have time to finish this up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-16 Thread via GitHub


Hexiaoqiao commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1815704414

   Hi @Neilxzn Any progress here? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-09 Thread via GitHub


ayushtkn commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1388949742


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java:
##
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs;
+
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.protocol.Block;
+import org.apache.hadoop.hdfs.protocol.ErasureCodingPolicy;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.protocol.LocatedBlocks;
+import org.apache.hadoop.hdfs.protocol.LocatedStripedBlock;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import org.apache.hadoop.io.erasurecode.CodecUtil;
+import org.apache.hadoop.io.erasurecode.ErasureCodeNative;
+import 
org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+public class TestDFSStripedInputStreamWithTimeout {
+
+  public static final Logger LOG =
+  LoggerFactory.getLogger(TestDFSStripedInputStreamWithTimeout.class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf = new Configuration();
+  private DistributedFileSystem fs;
+  private final Path dirPath = new Path("/striped");
+  private Path filePath = new Path(dirPath, "file");
+  private ErasureCodingPolicy ecPolicy;
+  private short dataBlocks;
+  private short parityBlocks;
+  private int cellSize;
+  private final int stripesPerBlock = 2;
+  private int blockSize;
+  private int blockGroupSize;
+
+  @Rule
+  public Timeout globalTimeout = new Timeout(30);
+
+  public ErasureCodingPolicy getEcPolicy() {
+return StripedFileTestUtil.getDefaultECPolicy();
+  }
+
+  @Before
+  public void setup() throws IOException {
+/*
+ * Initialize erasure coding policy.
+ */
+ecPolicy = getEcPolicy();
+dataBlocks = (short) ecPolicy.getNumDataUnits();
+parityBlocks = (short) ecPolicy.getNumParityUnits();
+cellSize = ecPolicy.getCellSize();
+blockSize = stripesPerBlock * cellSize;
+blockGroupSize = dataBlocks * blockSize;
+System.out.println("EC policy = " + ecPolicy);
+
+conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, blockSize);

Review Comment:
   This is deprecated config I believe, We should use 
``HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY``



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java:
##
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs;
+
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.protocol.Block;
+import 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-08 Thread via GitHub


bbeaudreault commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1802929219

   @ayushtkn @zhangshuyan0 looks like the remaining failing checks are 
unrelated, and the feedback was addressed. Any chance for another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-06 Thread via GitHub


hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1796080821

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   9m  0s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  3s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   3m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   2m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   2m 51s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 41s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 56s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 192m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 315m 36s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSUtil |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux a7218a7ce8bd 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ff4b113ef084a1e1843518d19f0726ca2994b63a |
   | 

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-06 Thread via GitHub


Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1794890337

   > Please also check the checkstyle and blannks reported by Yetus. Thanks. 
@Neilxzn
   
   Fix these checkstyle  and add unit test. Please review it again. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-05 Thread via GitHub


Hexiaoqiao commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1794081852

   Please also check the checkstyle and blannks reported by Yetus. Thanks. 
@Neilxzn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-05 Thread via GitHub


zhangshuyan0 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1382549418


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +236,60 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
 }
-length += ret;
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts - 1) {
+  curAttempts++;
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);
+continue;
+  }
+DFSClient.LOG.warn("Exception while reading from "
++ currentBlock + " of " + dfsStripedInputStream.getSrc() + " from "
++ currentNode, e);
+throw e;
   }
-  return length;
-} catch (ChecksumException ce) {
-  DFSClient.LOG.warn("Found Checksum error for "
-  + currentBlock + " from " + currentNode
-  + " at " + ce.getPos());
-  //Clear buffer to make next decode success
-  strategy.getReadBuffer().clear();
-  // we want to remember which block replicas we have tried
-  corruptedBlocks.addCorruptedBlock(currentBlock, currentNode);
-  throw ce;
-} catch (IOException e) {
-  DFSClient.LOG.warn("Exception while reading from "
-  + currentBlock + " of " + dfsStripedInputStream.getSrc() + " from "
-  + currentNode, e);
-  //Clear buffer to make next decode success
-  strategy.getReadBuffer().clear();
-  throw e;
 }
   }
+return  -1;

Review Comment:
   Agree with @ayushtkn. Line279-282 should be here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-03 Thread via GitHub


ayushtkn commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1381508823


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +236,60 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
 }
-length += ret;
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts - 1) {
+  curAttempts++;
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  alignedStripe.getOffsetInBlock(), targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);
+continue;
+  }
+DFSClient.LOG.warn("Exception while reading from "
++ currentBlock + " of " + dfsStripedInputStream.getSrc() + " from "
++ currentNode, e);
+throw e;
   }
-  return length;
-} catch (ChecksumException ce) {
-  DFSClient.LOG.warn("Found Checksum error for "
-  + currentBlock + " from " + currentNode
-  + " at " + ce.getPos());
-  //Clear buffer to make next decode success
-  strategy.getReadBuffer().clear();
-  // we want to remember which block replicas we have tried
-  corruptedBlocks.addCorruptedBlock(currentBlock, currentNode);
-  throw ce;
-} catch (IOException e) {
-  DFSClient.LOG.warn("Exception while reading from "
-  + currentBlock + " of " + dfsStripedInputStream.getSrc() + " from "
-  + currentNode, e);
-  //Clear buffer to make next decode success
-  strategy.getReadBuffer().clear();
-  throw e;
 }
   }
+return  -1;

Review Comment:
   I don't think we should return -1, there is logic which uses the return value
   ```
 for (ByteBufferStrategy strategy : strategies) {
   int bytesReead = readToBuffer(reader, datanode, strategy, 
currentBlock);
   ret += bytesReead;
 }
   ```
   
   We should throw exception or a valid value



##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +236,60 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex) throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
 }
-length += ret;
+return length;
+  } catch (ChecksumException ce) {
+

Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]

2023-11-02 Thread via GitHub


Hexiaoqiao commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1791847743

   cc @zhangshuyan0 Would you mind to take a review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org