Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2068640885 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 6m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 4s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 54s | | trunk passed | | +1 :green_heart: | compile | 3m 0s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 2m 48s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 47s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 18s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 2s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 53s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 21m 7s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 4s | | the patch passed | | +1 :green_heart: | compile | 2m 50s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 2m 50s | | the patch passed | | +1 :green_heart: | compile | 2m 51s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 2m 51s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 38s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/11/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 113 unchanged - 0 fixed = 114 total (was 113) | | +1 :green_heart: | mvnsite | 1m 8s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 8s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 49s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 214m 41s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 26s | | The patch does not generate ASF License warnings. | | | | 329m 18s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/11/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux f0536dd0a688 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d698a9445b4a06fd8978ee4c5005964270d236d9 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results |
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2068331088 ``` ./hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java:171: public DfsClientConf(Configuration conf) {:3: Method length is 156 lines (max allowed is 150). [MethodLength] ``` Should we suppress this checkstyle warning? Or are there any better suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2067551100 please fix checkstyle, thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2067076767 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 14s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 8s | | trunk passed | | +1 :green_heart: | compile | 2m 58s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 2m 49s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 46s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 22s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 10s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 22s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 21m 36s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 2m 52s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 2m 52s | | the patch passed | | +1 :green_heart: | compile | 2m 43s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 2m 43s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 38s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 2 new + 113 unchanged - 0 fixed = 115 total (was 113) | | +1 :green_heart: | mvnsite | 1m 9s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 8s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 50s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 51s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 212m 5s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 325m 11s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 91e3e1ef2a9e 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2066542689 > The UT `hadoop.hdfs.TestDFSStripedInputStreamWithTimeout ` run failed. add Datanode$closeDataXceiverServer method to close connnect for testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2066527804 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 0s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 50s | | trunk passed | | +1 :green_heart: | compile | 2m 58s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 2m 51s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 48s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 21s | | trunk passed | | +1 :green_heart: | javadoc | 1m 12s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 10s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 16s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 21m 30s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 21s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 2m 52s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 2m 52s | | the patch passed | | +1 :green_heart: | compile | 2m 49s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 2m 49s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 37s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 1m 6s | | the patch passed | | +1 :green_heart: | javadoc | 0m 54s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 12s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 22s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 51s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 211m 45s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 31s | | The patch does not generate ASF License warnings. | | | | 321m 9s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 911c52bf0959 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2062844297 The UT `hadoop.hdfs.TestDFSStripedInputStreamWithTimeout ` run failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2061596554 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 56s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 11s | | trunk passed | | +1 :green_heart: | compile | 2m 56s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 2m 53s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 47s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 31s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 17s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 19s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 21m 32s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 21s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 3s | | the patch passed | | +1 :green_heart: | compile | 2m 51s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 2m 51s | | the patch passed | | +1 :green_heart: | compile | 2m 44s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 2m 44s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 34s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 1m 8s | | the patch passed | | +1 :green_heart: | javadoc | 0m 54s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 10s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 13s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 54s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 209m 14s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 30s | | The patch does not generate ASF License warnings. | | | | 318m 25s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 574d2feade6c 5.15.0-94-generic
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568745996 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,63 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex, long offsetInBlock) + throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + offsetInBlock, targetBlocks, + readerInfos, chunkIndex, readTo)) { +blockReader = readerInfos[chunkIndex].reader; +String msg = "Reconnect to " + currentNode.getInfoAddr() ++ " for block " + currentBlock.getBlock(); +DFSClient.LOG.warn(msg); Review Comment: Can use the ``` DFSClient.LOG.warn("Reconnect to {} for block {}", currentNode.getInfoAddr(), currentBlock.getBlock()); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568756254 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,63 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex, long offsetInBlock) + throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { Review Comment: here update `while (true)` and can remove line[286~288], how about it ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568748476 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,63 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex, long offsetInBlock) + throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " Review Comment: here also -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568748150 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,63 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex, long offsetInBlock) + throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + offsetInBlock, targetBlocks, + readerInfos, chunkIndex, readTo)) { +blockReader = readerInfos[chunkIndex].reader; +String msg = "Reconnect to " + currentNode.getInfoAddr() ++ " for block " + currentBlock.getBlock(); +DFSClient.LOG.warn(msg); +continue; + } } -length += ret; +DFSClient.LOG.warn("Exception while reading from " Review Comment: Here also can use to `warn("{}", arg)` format? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2060941447 > @Neilxzn Hi, this patch is very useful, would you mind further fixing this PR? Sorry for my late reply. I have updated the patch based on the suggestions above. Please review it again. @haiyang1987 @zhangshuyan0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
zhangshuyan0 commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2056526550 @Neilxzn Hi, this patch is very useful, would you mind further fixing this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1556791082 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, Review Comment: yeah, Agree with @zhangshuyan0 comment. `readToBuffer(reader, datanode, strategy, currentBlock, chunkIndex, ret) ` here we shoud use `alignedStripe.getOffsetInBlock()+ret` instead of `ret` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
zhangshuyan0 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1555821906 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -284,7 +307,8 @@ private Callable readCells(final BlockReader reader, int ret = 0; for (ByteBufferStrategy strategy : strategies) { -int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock); +int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock, Review Comment: Could you please correct this variable name by the way? `bytesReead` -> `bytesRead` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
zhangshuyan0 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1555819771 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, Review Comment: Great catch! Agree with @haiyang1987 's idea. But the solution seems flawed. Should we use `alignedStripe.getOffsetInBlock()+ret` instead of `ret` here? Looking forward to your reply! @haiyang1987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531798611 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { Review Comment: ``` private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, LocatedBlock currentBlock, int chunkIndex, long offsetInBlock) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531799282 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, Review Comment: ``` if (dfsStripedInputStream.createBlockReader(currentBlock, offsetInBlock, targetBlocks, ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531806620 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, Review Comment: Hi @Neilxzn @Hexiaoqiao @ayushtkn @zhangshuyan0 @ZanderXu what dou you think? Please also help to look into this issue when you have free time , thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531803838 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, Review Comment: If use pread to read data, if the currently set buffer size is a block size, For a block in a dn, the data of multiple cell units may be read, so the size of the ByteBufferStrategy array in the StripingChunk corresponding to the AlignedStripe is calculated to be multiple (there are multiple List slices in ChunkByteBuffer), https://github.com/apache/hadoop/assets/3760130/40f7a944-ea57-4891-9719-86a1b009244d;> So when processing retry createBlockReader in readToBuffer, we may need to consider the current actual offsetInBlock to avoid reading duplicate data from datanode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1531794650 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -284,7 +307,8 @@ private Callable readCells(final BlockReader reader, int ret = 0; for (ByteBufferStrategy strategy : strategies) { -int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock); +int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock, +chunkIndex); Review Comment: For `readToBuffer` maybe need to consider the current actual offsetInBlock. `readToBuffer(reader, datanode, strategy, currentBlock, chunkIndex, ret);` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987902095 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 19s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 40s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 21m 13s | | trunk passed | | +1 :green_heart: | compile | 3m 2s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 3m 5s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 48s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 19s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 48s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 1m 39s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html) | hadoop-hdfs-project/hadoop-hdfs-client in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 22m 50s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 23m 3s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 24s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 1s | | the patch passed | | +1 :green_heart: | compile | 2m 55s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 2m 55s | | the patch passed | | +1 :green_heart: | compile | 2m 56s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 2m 56s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 44s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 1m 13s | | the patch passed | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 23s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 7s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 54s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 235m 57s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 351m 39s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestEncryptionZonesWithKMS | | | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.tools.TestDFSAdmin | | |
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987825036 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 55s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 51s | | trunk passed | | +1 :green_heart: | compile | 3m 1s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 2m 53s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 46s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 21s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 1m 30s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html) | hadoop-hdfs-project/hadoop-hdfs-client in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 22m 5s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 22m 18s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 24s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 10s | | the patch passed | | +1 :green_heart: | compile | 3m 5s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 3m 5s | | the patch passed | | +1 :green_heart: | compile | 3m 1s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 3m 1s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 40s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 1m 11s | | the patch passed | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 49s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 217m 9s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 330m 42s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout | | | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | |
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987535597 > > Hi @Neilxzn Any progress here? Thanks. > > this PR is still necessary, there are some similar problems in our environment~ > > @haiyang1987 Our online environment (70 PB EC Data cluster, spark + hive olap) has already applied this patch. So far, everything is running normally. Noted, thanks for you work ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987533616 Rebase it to branch trunk. And The new test `org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout` passed on my lolcal env. https://github.com/apache/hadoop/assets/10757009/6bedc732-cbe7-403b-8c6a-b5e78a33527f;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987532047 > Hi @Neilxzn Any progress here? Thanks. > > this PR is still necessary, there are some similar problems in our environment~ @haiyang1987 Our online environment (70 PB EC Data cluster, spark + hive olap) has already applied this patch. So far, everything is running normally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn opened a new pull request, #5829: URL: https://github.com/apache/hadoop/pull/5829 ### Description of PR https://issues.apache.org/jira/browse/HDFS-15413 Offer a available patch to fix HDFS-15413. This patch add dfs.client.read.striped.datanode.max.attempts config to allow users to adjust the number of dn retries to solve the problem of Datanode timeout when reading EC files. ### How was this patch tested? no add test. just test in our cluster ### For code changes: add dfs.client.read.striped.datanode.max.attempts config -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1987499017 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | patch | 0m 19s | | https://github.com/apache/hadoop/pull/5829 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/5/console | | versions | git=2.34.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn closed pull request #5829: HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout URL: https://github.com/apache/hadoop/pull/5829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1518480694 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +235,62 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + curAttempts++; + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; +} +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts) { + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, + readerInfos, chunkIndex, readTo)) { +blockReader = readerInfos[chunkIndex].reader; +String msg = "Reconnect to " + currentNode.getInfoAddr() ++ " for block " + currentBlock.getBlock(); +DFSClient.LOG.warn(msg); +continue; + } + DFSClient.LOG.warn("Exception while reading from " Review Comment: Line[278-281] will move` if (curAttempts < readDNMaxAttempts) {` outside -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1985490862 这是自动回复邮件。来件已接收,谢谢。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
haiyang1987 commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1985490125 Hi @Neilxzn Any progress here? Thanks. this PR is still necessary, there are some similar problems in our environment~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1849677715 > @Neilxzn I tried & it fails locally > > ``` > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 17.64 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout > [ERROR] testPreadTimeout(org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout) Time elapsed: 17.509 s <<< FAILURE! > java.lang.AssertionError: It Should fail to read striped time out with 1 attempt . > at org.junit.Assert.fail(Assert.java:89) > at org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout.testPreadTimeout(TestDFSStripedInputStreamWithTimeout.java:145) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:750) > > [INFO] > [INFO] Results: > [INFO] > [ERROR] Failures: > [ERROR] TestDFSStripedInputStreamWithTimeout.testPreadTimeout:145 It Should fail to read striped time out with 1 attempt . > [INFO] > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0 > [INFO] > [ERROR] There are test failures. > ``` > > To reproduce: in the hadoop root directory there is file named `start-build-env.sh`, run that `bash start-build-env.sh`, it will give you a docker env, run the test inside that & it will fail Thank you. I will check it again soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
ayushtkn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1845117963 @Neilxzn I tried & it fails locally ``` [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 17.64 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout [ERROR] testPreadTimeout(org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout) Time elapsed: 17.509 s <<< FAILURE! java.lang.AssertionError: It Should fail to read striped time out with 1 attempt . at org.junit.Assert.fail(Assert.java:89) at org.apache.hadoop.hdfs.TestDFSStripedInputStreamWithTimeout.testPreadTimeout(TestDFSStripedInputStreamWithTimeout.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:750) [INFO] [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestDFSStripedInputStreamWithTimeout.testPreadTimeout:145 It Should fail to read striped time out with 1 attempt . [INFO] [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0 [INFO] [ERROR] There are test failures. ``` To reproduce: in the hadoop root directory there is file named ``start-build-env.sh``, run that ``bash start-build-env.sh``, it will give you a docker env, run the test inside that & it will fail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1844334814 I can pass the unit test hadoop.hdfs.TestDFSStripedInputStreamWithTimeout in my local development environment, but it fails on GitHub Jenkins. ![image](https://github.com/apache/hadoop/assets/10757009/a511b4e1-8413-44bb-9136-5e7cc1f3ff17) Check if the test log of the development environment is consistent with the assumption. When the client reads the file for the first time and stops for 10 seconds, the connection between the client and the datanode server will be automatically disconnected, resulting in a failed subsequent read by the client. @ayushtkn Any other suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1842713669 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 19s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 24s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 14s | | trunk passed | | +1 :green_heart: | compile | 2m 51s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 2m 47s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 44s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 16s | | trunk passed | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 29s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 9s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 21s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 2s | | the patch passed | | +1 :green_heart: | compile | 2m 47s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 2m 47s | | the patch passed | | +1 :green_heart: | compile | 2m 42s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 2m 42s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 35s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 1m 6s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 5s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 7s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 49s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 189m 58s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 27s | | The patch does not generate ASF License warnings. | | | | 293m 19s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 807603bf2dcf 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / bb46dbd471d96878492fe660b2af03a8384f8123 | | Default Java | Private
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1840975664 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 2s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 35m 6s | | trunk passed | | +1 :green_heart: | compile | 6m 4s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 5m 51s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 27s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 22s | | trunk passed | | +1 :green_heart: | javadoc | 1m 52s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 21s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 56s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 54s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 59s | | the patch passed | | +1 :green_heart: | compile | 5m 54s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 5m 54s | | the patch passed | | +1 :green_heart: | compile | 5m 42s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 42s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 1m 17s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 2 new + 45 unchanged - 0 fixed = 47 total (was 45) | | +1 :green_heart: | mvnsite | 2m 6s | | the patch passed | | +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 6s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 59s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 22s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 251m 52s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 43s | | The patch does not generate ASF License warnings. | | | | 439m 20s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 15e76d99d238 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / db52c34f95a1caeb7e07157c6289793acf91c514 |
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1415029913 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java: ## @@ -0,0 +1,168 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs; + +import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.protocol.Block; +import org.apache.hadoop.hdfs.protocol.ErasureCodingPolicy; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.protocol.LocatedStripedBlock; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils; +import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset; +import org.apache.hadoop.io.erasurecode.CodecUtil; +import org.apache.hadoop.io.erasurecode.ErasureCodeNative; +import org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.Arrays; + +public class TestDFSStripedInputStreamWithTimeout { + + public static final Logger LOG = + LoggerFactory.getLogger(TestDFSStripedInputStreamWithTimeout.class); + + private MiniDFSCluster cluster; + private Configuration conf = new Configuration(); + private DistributedFileSystem fs; + private final Path dirPath = new Path("/striped"); + private Path filePath = new Path(dirPath, "file"); + private ErasureCodingPolicy ecPolicy; + private short dataBlocks; + private short parityBlocks; + private int cellSize; + private final int stripesPerBlock = 2; + private int blockSize; + private int blockGroupSize; + + @Rule + public Timeout globalTimeout = new Timeout(30); + + public ErasureCodingPolicy getEcPolicy() { +return StripedFileTestUtil.getDefaultECPolicy(); + } + + @Before + public void setup() throws IOException { +/* + * Initialize erasure coding policy. + */ +ecPolicy = getEcPolicy(); +dataBlocks = (short) ecPolicy.getNumDataUnits(); +parityBlocks = (short) ecPolicy.getNumParityUnits(); +cellSize = ecPolicy.getCellSize(); +blockSize = stripesPerBlock * cellSize; +blockGroupSize = dataBlocks * blockSize; +System.out.println("EC policy = " + ecPolicy); + +conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, blockSize); +conf.setInt(DFSConfigKeys.DFS_NAMENODE_REPLICATION_MAX_STREAMS_KEY, 0); + +conf.setInt(DFSConfigKeys.DFS_DATANODE_SOCKET_WRITE_TIMEOUT_KEY, 1000); +// SET CONFIG FOR HDFS CLIENT +conf.setInt(DFSConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, 1000); +conf.setInt(HdfsClientConfigKeys.StripedRead.DATANODE_MAX_ATTEMPTS, 3); + +if (ErasureCodeNative.isNativeCodeLoaded()) { + conf.set( + CodecUtil.IO_ERASURECODE_CODEC_RS_RAWCODERS_KEY, + NativeRSRawErasureCoderFactory.CODER_NAME); +} +conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, +GenericTestUtils.getRandomizedTempPath()); +SimulatedFSDataset.setFactory(conf); +startUp(); + } + + private void startUp() throws IOException { +cluster = new MiniDFSCluster.Builder(conf).numDataNodes( +dataBlocks + parityBlocks).build(); +cluster.waitActive(); +for (DataNode dn : cluster.getDataNodes()) { + DataNodeTestUtils.setHeartbeatsDisabledForTests(dn, true); +} +fs = cluster.getFileSystem(); +fs.enableErasureCodingPolicy(getEcPolicy().getName()); +fs.mkdirs(dirPath); +fs.getClient() +.setErasureCodingPolicy(dirPath.toString(), ecPolicy.getName()); + } + + @After + public void tearDown() { +if (cluster != null) { + cluster.shutdown(); + cluster = null; +} + } + + @Test + public void testPreadTimeout() throws Exception { +final int numBlocks
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1415026140 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java: ## @@ -0,0 +1,168 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs; + +import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.protocol.Block; +import org.apache.hadoop.hdfs.protocol.ErasureCodingPolicy; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.protocol.LocatedStripedBlock; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils; +import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset; +import org.apache.hadoop.io.erasurecode.CodecUtil; +import org.apache.hadoop.io.erasurecode.ErasureCodeNative; +import org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.Arrays; + +public class TestDFSStripedInputStreamWithTimeout { + + public static final Logger LOG = + LoggerFactory.getLogger(TestDFSStripedInputStreamWithTimeout.class); + + private MiniDFSCluster cluster; + private Configuration conf = new Configuration(); + private DistributedFileSystem fs; + private final Path dirPath = new Path("/striped"); + private Path filePath = new Path(dirPath, "file"); + private ErasureCodingPolicy ecPolicy; + private short dataBlocks; + private short parityBlocks; + private int cellSize; + private final int stripesPerBlock = 2; + private int blockSize; + private int blockGroupSize; + + @Rule + public Timeout globalTimeout = new Timeout(30); + + public ErasureCodingPolicy getEcPolicy() { +return StripedFileTestUtil.getDefaultECPolicy(); + } + + @Before + public void setup() throws IOException { +/* + * Initialize erasure coding policy. + */ +ecPolicy = getEcPolicy(); +dataBlocks = (short) ecPolicy.getNumDataUnits(); +parityBlocks = (short) ecPolicy.getNumParityUnits(); +cellSize = ecPolicy.getCellSize(); +blockSize = stripesPerBlock * cellSize; +blockGroupSize = dataBlocks * blockSize; +System.out.println("EC policy = " + ecPolicy); + +conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, blockSize); Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1838407963 > Hi @Neilxzn , any chance you have time to finish this up? Sorry for the late reply. I have been busy with other things recently. I will try to submit a new unit test tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
bbeaudreault commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1837536440 Hi @Neilxzn , any chance you have time to finish this up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Hexiaoqiao commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1815704414 Hi @Neilxzn Any progress here? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
ayushtkn commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1388949742 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java: ## @@ -0,0 +1,168 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs; + +import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.protocol.Block; +import org.apache.hadoop.hdfs.protocol.ErasureCodingPolicy; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.protocol.LocatedStripedBlock; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils; +import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset; +import org.apache.hadoop.io.erasurecode.CodecUtil; +import org.apache.hadoop.io.erasurecode.ErasureCodeNative; +import org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.Arrays; + +public class TestDFSStripedInputStreamWithTimeout { + + public static final Logger LOG = + LoggerFactory.getLogger(TestDFSStripedInputStreamWithTimeout.class); + + private MiniDFSCluster cluster; + private Configuration conf = new Configuration(); + private DistributedFileSystem fs; + private final Path dirPath = new Path("/striped"); + private Path filePath = new Path(dirPath, "file"); + private ErasureCodingPolicy ecPolicy; + private short dataBlocks; + private short parityBlocks; + private int cellSize; + private final int stripesPerBlock = 2; + private int blockSize; + private int blockGroupSize; + + @Rule + public Timeout globalTimeout = new Timeout(30); + + public ErasureCodingPolicy getEcPolicy() { +return StripedFileTestUtil.getDefaultECPolicy(); + } + + @Before + public void setup() throws IOException { +/* + * Initialize erasure coding policy. + */ +ecPolicy = getEcPolicy(); +dataBlocks = (short) ecPolicy.getNumDataUnits(); +parityBlocks = (short) ecPolicy.getNumParityUnits(); +cellSize = ecPolicy.getCellSize(); +blockSize = stripesPerBlock * cellSize; +blockGroupSize = dataBlocks * blockSize; +System.out.println("EC policy = " + ecPolicy); + +conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, blockSize); Review Comment: This is deprecated config I believe, We should use ``HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY`` ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java: ## @@ -0,0 +1,168 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs; + +import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.protocol.Block; +import
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
bbeaudreault commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1802929219 @ayushtkn @zhangshuyan0 looks like the remaining failing checks are unrelated, and the feedback was addressed. Any chance for another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
hadoop-yetus commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1796080821 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 9m 0s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 3s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 34s | | trunk passed | | +1 :green_heart: | compile | 3m 7s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 3m 4s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 52s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 31s | | trunk passed | | +1 :green_heart: | javadoc | 1m 22s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 3m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 16s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 26s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 2m 59s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 2m 59s | | the patch passed | | +1 :green_heart: | compile | 2m 51s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 2m 51s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 41s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 1m 17s | | the patch passed | | +1 :green_heart: | javadoc | 1m 5s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 3m 25s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 20s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 56s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 192m 14s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 315m 36s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSUtil | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5829 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux a7218a7ce8bd 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ff4b113ef084a1e1843518d19f0726ca2994b63a | |
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Neilxzn commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1794890337 > Please also check the checkstyle and blannks reported by Yetus. Thanks. @Neilxzn Fix these checkstyle and add unit test. Please review it again. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Hexiaoqiao commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1794081852 Please also check the checkstyle and blannks reported by Yetus. Thanks. @Neilxzn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
zhangshuyan0 commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1382549418 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +236,60 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; } -length += ret; +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts - 1) { + curAttempts++; + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, + readerInfos, chunkIndex, readTo)) { +blockReader = readerInfos[chunkIndex].reader; +String msg = "Reconnect to " + currentNode.getInfoAddr() ++ " for block " + currentBlock.getBlock(); +DFSClient.LOG.warn(msg); +continue; + } +DFSClient.LOG.warn("Exception while reading from " ++ currentBlock + " of " + dfsStripedInputStream.getSrc() + " from " ++ currentNode, e); +throw e; } - return length; -} catch (ChecksumException ce) { - DFSClient.LOG.warn("Found Checksum error for " - + currentBlock + " from " + currentNode - + " at " + ce.getPos()); - //Clear buffer to make next decode success - strategy.getReadBuffer().clear(); - // we want to remember which block replicas we have tried - corruptedBlocks.addCorruptedBlock(currentBlock, currentNode); - throw ce; -} catch (IOException e) { - DFSClient.LOG.warn("Exception while reading from " - + currentBlock + " of " + dfsStripedInputStream.getSrc() + " from " - + currentNode, e); - //Clear buffer to make next decode success - strategy.getReadBuffer().clear(); - throw e; } } +return -1; Review Comment: Agree with @ayushtkn. Line279-282 should be here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
ayushtkn commented on code in PR #5829: URL: https://github.com/apache/hadoop/pull/5829#discussion_r1381508823 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +236,60 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; } -length += ret; +return length; + } catch (ChecksumException ce) { +DFSClient.LOG.warn("Found Checksum error for " ++ currentBlock + " from " + currentNode ++ " at " + ce.getPos()); +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +// we want to remember which block replicas we have tried +corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), currentNode); +throw ce; + } catch (IOException e) { +//Clear buffer to make next decode success +strategy.getReadBuffer().clear(); +if (curAttempts < readDNMaxAttempts - 1) { + curAttempts++; + if (readerInfos[chunkIndex].reader != null) { +readerInfos[chunkIndex].reader.close(); + } + if (dfsStripedInputStream.createBlockReader(currentBlock, + alignedStripe.getOffsetInBlock(), targetBlocks, + readerInfos, chunkIndex, readTo)) { +blockReader = readerInfos[chunkIndex].reader; +String msg = "Reconnect to " + currentNode.getInfoAddr() ++ " for block " + currentBlock.getBlock(); +DFSClient.LOG.warn(msg); +continue; + } +DFSClient.LOG.warn("Exception while reading from " ++ currentBlock + " of " + dfsStripedInputStream.getSrc() + " from " ++ currentNode, e); +throw e; } - return length; -} catch (ChecksumException ce) { - DFSClient.LOG.warn("Found Checksum error for " - + currentBlock + " from " + currentNode - + " at " + ce.getPos()); - //Clear buffer to make next decode success - strategy.getReadBuffer().clear(); - // we want to remember which block replicas we have tried - corruptedBlocks.addCorruptedBlock(currentBlock, currentNode); - throw ce; -} catch (IOException e) { - DFSClient.LOG.warn("Exception while reading from " - + currentBlock + " of " + dfsStripedInputStream.getSrc() + " from " - + currentNode, e); - //Clear buffer to make next decode success - strategy.getReadBuffer().clear(); - throw e; } } +return -1; Review Comment: I don't think we should return -1, there is logic which uses the return value ``` for (ByteBufferStrategy strategy : strategies) { int bytesReead = readToBuffer(reader, datanode, strategy, currentBlock); ret += bytesReead; } ``` We should throw exception or a valid value ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java: ## @@ -233,41 +236,60 @@ private ByteBufferStrategy[] getReadStrategies(StripingChunk chunk) { private int readToBuffer(BlockReader blockReader, DatanodeInfo currentNode, ByteBufferStrategy strategy, - ExtendedBlock currentBlock) throws IOException { + LocatedBlock currentBlock, int chunkIndex) throws IOException { final int targetLength = strategy.getTargetLength(); -int length = 0; -try { - while (length < targetLength) { -int ret = strategy.readFromBlock(blockReader); -if (ret < 0) { - throw new IOException("Unexpected EOS from the reader"); +int curAttempts = 0; +while (curAttempts < readDNMaxAttempts) { + int length = 0; + try { +while (length < targetLength) { + int ret = strategy.readFromBlock(blockReader); + if (ret < 0) { +throw new IOException("Unexpected EOS from the reader"); + } + length += ret; } -length += ret; +return length; + } catch (ChecksumException ce) { +
Re: [PR] HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout [hadoop]
Hexiaoqiao commented on PR #5829: URL: https://github.com/apache/hadoop/pull/5829#issuecomment-1791847743 cc @zhangshuyan0 Would you mind to take a review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org