[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695471#comment-17695471 ] ASF GitHub Bot commented on HDFS-16938: --- hadoop-yetus commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451337470 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 41s | | trunk passed | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 28s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 40s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 26s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 54s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 86 unchanged - 2 fixed = 86 total (was 88) | | +1 :green_heart: | mvnsite | 1m 29s | | the patch passed | | +1 :green_heart: | javadoc | 0m 54s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 42s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 36m 17s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 260m 44s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 388m 53s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeReconfiguration | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithShortCircuitRead | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.mover.TestStorageMover | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5445 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c7f13b8c3285 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e7c82dbbd47d725de6e0cbd16f6ba90cb10bca7e | | Default Java | Private
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695441#comment-17695441 ] ASF GitHub Bot commented on HDFS-16938: --- hadoop-yetus commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451264241 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 3s | | trunk passed | | +1 :green_heart: | compile | 1m 28s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 5s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 32s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 36s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 53s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 85 unchanged - 2 fixed = 85 total (was 87) | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 29s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 14s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 18s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 208m 13s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 315m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5445 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d5bbeba1b09f 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 56459b337c0160106aaef5a8fb50fb51b963cb39 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/2/testReport/ | | Max. process+thread count | 2836 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695422#comment-17695422 ] ASF GitHub Bot commented on HDFS-16938: --- hadoop-yetus commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451224598 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 11s | | trunk passed | | +1 :green_heart: | compile | 1m 33s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 29s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 8s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 12s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 18s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 49s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 85 unchanged - 2 fixed = 85 total (was 87) | | +1 :green_heart: | mvnsite | 1m 20s | | the patch passed | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 29s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 205m 36s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 49s | | The patch does not generate ASF License warnings. | | | | 318m 21s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5445 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 52247ef2efc5 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 64dc8ea3920f2a5b1ca01c41cc036d118e95f9f6 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/1/testReport/ | | Max. process+thread count | 3845 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5445/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (HDFS-16934) org.apache.hadoop.hdfs.tools.TestDFSAdmin#testAllDatanodesReconfig regression
[ https://issues.apache.org/jira/browse/HDFS-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695419#comment-17695419 ] ASF GitHub Bot commented on HDFS-16934: --- hadoop-yetus commented on PR #5434: URL: https://github.com/apache/hadoop/pull/5434#issuecomment-1451187716 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 36s | | trunk passed | | +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 18s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 33s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 28s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 51s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 21s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 12s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 206m 57s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5434/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 313m 40s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5434/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5434 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 45aaa91fd3c9 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / b525d2854d098d4ad3a7877fa13ebf6343c5538c | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5434/6/testReport/ | | Max. process+thread count | 2973 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695400#comment-17695400 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1451124197 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 25s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 38s | | trunk passed | | +1 :green_heart: | compile | 6m 1s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 39s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 18s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 52s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 17s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 26s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 5s | | the patch passed | | +1 :green_heart: | compile | 5m 55s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 55s | | the patch passed | | +1 :green_heart: | compile | 5m 36s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 6s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/10/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 41 unchanged - 1 fixed = 42 total (was 42) | | +1 :green_heart: | mvnsite | 2m 9s | | the patch passed | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 56s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 27s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 203m 50s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 50s | | The patch does not generate ASF License warnings. | | | | 344m 52s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux a927011dee89 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 40b77b2a52ef6faabcd3190738ec01418a6ca550 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions
[jira] [Resolved] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom McCormick resolved HDFS-16896. -- Resolution: Fixed > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695395#comment-17695395 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451105473 > If someone removes the processQueueMessages itself from the sendHeartbeat, then also this test should fail or atleast some should +1 > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695394#comment-17695394 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451104757 > It is inducing a race by nextHeartbeatTime Absolutely, that's what I thought too. But yes you are right, other than adding sleeps, it's bit tricky to reproduce. But yeah our Jenkins are like that, some of the weirdness could be reproduced by us only if we inject sleeps. Anyways, as long as daily builds stay happy, it's fine. Thanks again :) > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695393#comment-17695393 ] ASF GitHub Bot commented on HDFS-16896: --- omalley merged PR #5444: URL: https://github.com/apache/hadoop/pull/5444 > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695392#comment-17695392 ] ASF GitHub Bot commented on HDFS-16938: --- ayushtkn commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451100991 Playing a bit more with it: Single sleep doesn't repro itself. My wild guess is, it isn't testing from where we started. It is inducing a race by **nextHeartbeatTime**, that too because we induced sleeps, else like very tough with the kind of code we have now. > we might rather want to wait for source code to do that so that if something changes in source code sequence or so, our test would be able to catch it not the intention of the original test, it is just waiting for the message in the Queue to be sent and processed and checking the response of namenode to that, or if namenode acknowledges that or not. (I still feel we could have invoked **processQueueMessages** directly and saved some time, but lets see, the present code is also working) If someone removes the processQueueMessages itself from the **sendHeartbeat**, then also this test should fail or atleast some should, rest we can't guarantee everything... >but anyways nothing wrong with waiting for sometime and circling back to this when/if required. yeps, lets wait and see if any failures and circle back in future and see what are the potential solutions then. > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16923) The getListing RPC will throw NPE if the path does not exist
[ https://issues.apache.org/jira/browse/HDFS-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-16923: --- Fix Version/s: 3.4.0 3.3.6 > The getListing RPC will throw NPE if the path does not exist > > > Key: HDFS-16923 > URL: https://issues.apache.org/jira/browse/HDFS-16923 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > The getListing RPC will throw NPE if the path does not exist. And the stack > as bellow: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4195) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1421) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:783) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:622) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:590) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16923) The getListing RPC will throw NPE if the path does not exist
[ https://issues.apache.org/jira/browse/HDFS-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16923. Resolution: Fixed > The getListing RPC will throw NPE if the path does not exist > > > Key: HDFS-16923 > URL: https://issues.apache.org/jira/browse/HDFS-16923 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > The getListing RPC will throw NPE if the path does not exist. And the stack > as bellow: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4195) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1421) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:783) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:622) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:590) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16923) The getListing RPC will throw NPE if the path does not exist
[ https://issues.apache.org/jira/browse/HDFS-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695379#comment-17695379 ] ASF GitHub Bot commented on HDFS-16923: --- xkrogen merged PR #5400: URL: https://github.com/apache/hadoop/pull/5400 > The getListing RPC will throw NPE if the path does not exist > > > Key: HDFS-16923 > URL: https://issues.apache.org/jira/browse/HDFS-16923 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > > The getListing RPC will throw NPE if the path does not exist. And the stack > as bellow: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4195) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1421) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:783) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:622) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:590) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16923) The getListing RPC will throw NPE if the path does not exist
[ https://issues.apache.org/jira/browse/HDFS-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695375#comment-17695375 ] ASF GitHub Bot commented on HDFS-16923: --- xkrogen commented on PR #5400: URL: https://github.com/apache/hadoop/pull/5400#issuecomment-1451080156 The only test failures is `TestDirectoryScanner.testThrottling`: ``` [ERROR] Tests run: 13, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 575.031 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [ERROR] testThrottling(org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner) Time elapsed: 162.513 s <<< FAILURE! java.lang.AssertionError: Throttle is too permissive ``` This one is a bit nondeterministic as it's actually running things and checking how long it takes. It doesn't look related. Merging to `trunk` and `branch-3.3`. Thanks @ZanderXu ! I'm also going to see about getting this into 3.3.5 since we have [HDFS-16732](https://issues.apache.org/jira/browse/HDFS-16732) there and this is a pretty bad bug. > The getListing RPC will throw NPE if the path does not exist > > > Key: HDFS-16923 > URL: https://issues.apache.org/jira/browse/HDFS-16923 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > > The getListing RPC will throw NPE if the path does not exist. And the stack > as bellow: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4195) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1421) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:783) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:622) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:590) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695372#comment-17695372 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451077691 > If my intent is just for processQueueMessages, I will expose and just shoot that directly, rather than doing the whole loop. That would also work but as part of the test, we might rather want to wait for source code to do that so that if something changes in source code sequence or so, our test would be able to catch it (if the intention of the test is to wait for processQueueMessages to be successfully called and completed by source code). > Changing the existing uses with this would be slowing down tests, which are already above tolerable limits. For this PR, only `testReportBadBlocks` test is updated to use it, but anyways nothing wrong with waiting for sometime and circling back to this when/if required. Thanks Ayush :) > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16923) The getListing RPC will throw NPE if the path does not exist
[ https://issues.apache.org/jira/browse/HDFS-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-16923: --- Priority: Critical (was: Major) > The getListing RPC will throw NPE if the path does not exist > > > Key: HDFS-16923 > URL: https://issues.apache.org/jira/browse/HDFS-16923 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > > The getListing RPC will throw NPE if the path does not exist. And the stack > as bellow: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4195) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1421) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:783) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:622) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:590) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695368#comment-17695368 ] ASF GitHub Bot commented on HDFS-16938: --- ayushtkn commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451064227 First patch isn't something to consider itself. Second one is like very unrealistic in general, there aren't any things which we can blame above which can consume time, they are some naive if checks or so, can't take this time... I don't think we have a use case for now for this util, If my intent is just for processQueueMessages, I will expose and just shoot that directly, rather than doing the whole loop. Changing the existing uses with this would be slowing down tests, which are already above tolerable limits. Lets hold it, and observe if we get something around this in future we can circle back > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695363#comment-17695363 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451021126 The above patch, when applied with this PR changes, the test is passing consistently. Whereas without PR changes, the test is consistently failing (failed 7 times locally without PR changes, passed 5 times with PR changes). > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695362#comment-17695362 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451015537 Another way I am able to repro consistently: ``` diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java index e9f424604b4..9b17a126da1 100755 - > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695358#comment-17695358 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451008842 Though it's difficult to reproduce, I thought this utility would help test to ensure 100% that namenode has definitely received report as part of `ReportBadBlockAction#reportTo`. > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695356#comment-17695356 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450990755 I tried multiple cases and for some sleeps, I am able to repro, only sometimes. The only way I am able to consistently repro failure is by applying this patch: ``` diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java index e9f424604b4..c39eca73f38 100755 - > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695343#comment-17695343 ] ASF GitHub Bot commented on HDFS-16938: --- ayushtkn commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450903655 Viraj, can you help me repro the scenario. I added 10K sleep before processEnqueed method last time and the test didn’t fail for me, triggerHeartbeat was working there > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695337#comment-17695337 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450896182 @ayushtkn @tomscut could you please review this PR? > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695335#comment-17695335 ] ASF GitHub Bot commented on HDFS-16938: --- virajjasani opened a new pull request, #5445: URL: https://github.com/apache/hadoop/pull/5445 As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat and wait until BP thread queue is fully processed. This would ensure 100% consistency w.r.t active namenode being able to receive bad block reports from the given datanode. This utility would resolve flakes for the tests that rely on namenode's awareness of the reported bad blocks by datanodes. > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16938: -- Labels: pull-request-available (was: ) > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
Viraj Jasani created HDFS-16938: --- Summary: Utility to trigger heartbeat and wait until BP thread queue is fully processed Key: HDFS-16938 URL: https://issues.apache.org/jira/browse/HDFS-16938 Project: Hadoop HDFS Issue Type: Improvement Reporter: Viraj Jasani Assignee: Viraj Jasani As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat and wait until BP thread queue is fully processed. This would ensure 100% consistency w.r.t active namenode being able to receive bad block reports from the given datanode. This utility would resolve flakes for the tests that rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695305#comment-17695305 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5444: URL: https://github.com/apache/hadoop/pull/5444#issuecomment-1450786773 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | docker | 5m 55s | | Docker failed to build run-specific yetus/hadoop:tp-9052}. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/5444 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5444/1/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom McCormick updated HDFS-16896: - Fix Version/s: 3.4.0 3.3.5 > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695301#comment-17695301 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 opened a new pull request, #5444: URL: https://github.com/apache/hadoop/pull/5444 …… (#5322) HDFS-16896 clear ignoredNodes list when we clear deadnode list on refetchLocations. ignoredNodes list is only used on hedged read codepath ### Description of PR Backporting hedged read fixes to branch 3.3 ### How was this patch tested? Added tests and tested by LinkedIn Trino to verify performance improvements ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695290#comment-17695290 ] ASF GitHub Bot commented on HDFS-16896: --- omalley merged PR #5322: URL: https://github.com/apache/hadoop/pull/5322 > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695277#comment-17695277 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1122202223 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -955,6 +965,10 @@ private DNAddrPair chooseDataNode(LocatedBlock block, } } + /** + * RefetchLocations should only be called when there are no active requests + * to datanodes. In the hedged read case this means futures should be empty + */ Review Comment: Added > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695276#comment-17695276 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1122201971 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: fixed > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16935) TestFsDatasetImpl.testReportBadBlocks brittle
[ https://issues.apache.org/jira/browse/HDFS-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695273#comment-17695273 ] ASF GitHub Bot commented on HDFS-16935: --- virajjasani commented on code in PR #5432: URL: https://github.com/apache/hadoop/pull/5432#discussion_r1122197968 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java: ## @@ -1101,15 +1099,12 @@ public void testReportBadBlocks() throws Exception { block = DFSTestUtil.getFirstBlock(fs, filePath); // Test for the overloaded method reportBadBlocks - dataNode.reportBadBlocks(block, dataNode.getFSDataset() - .getFsVolumeReferences().get(0)); - Thread.sleep(3000); - BlockManagerTestUtil.updateState(cluster.getNamesystem() - .getBlockManager()); - // Verify the bad block has been reported to namenode - Assert.assertEquals(1, cluster.getNamesystem().getCorruptReplicaBlocks()); -} finally { - cluster.shutdown(); + dataNode.reportBadBlocks(block, dataNode.getFSDataset().getFsVolumeReferences().get(0)); + GenericTestUtils.waitFor(() -> { + BlockManagerTestUtil.updateState(cluster.getNamesystem().getBlockManager()); +// Verify the bad block has been reported to namenode +return 1 == cluster.getNamesystem().getCorruptReplicaBlocks(); + }, 100, 1, "Corrupted replica blocks could not be found"); Review Comment: Let me create follow-up work to make this strongly consistent with reporting of the bad block. > TestFsDatasetImpl.testReportBadBlocks brittle > - > > Key: HDFS-16935 > URL: https://issues.apache.org/jira/browse/HDFS-16935 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.5, 3.3.9 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > jenkins failure as sleep() time not long enough > {code} > Failing for the past 1 build (Since #4 ) > Took 7.4 sec. > Error Message > expected:<1> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > {code} > assert is after a 3s sleep waiting for reports coming in. > {code} > dataNode.reportBadBlocks(block, dataNode.getFSDataset() > .getFsVolumeReferences().get(0)); > Thread.sleep(3000); // 3s > sleep > BlockManagerTestUtil.updateState(cluster.getNamesystem() > .getBlockManager()); > // Verify the bad block has been reported to namenode > Assert.assertEquals(1, > cluster.getNamesystem().getCorruptReplicaBlocks()); // here > {code} > LambdaTestUtils.eventually() should be used around this assert, maybe with an > even shorter initial delay so on faster systems, test is faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16935) TestFsDatasetImpl.testReportBadBlocks brittle
[ https://issues.apache.org/jira/browse/HDFS-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HDFS-16935. --- Resolution: Fixed > TestFsDatasetImpl.testReportBadBlocks brittle > - > > Key: HDFS-16935 > URL: https://issues.apache.org/jira/browse/HDFS-16935 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.5, 3.3.9 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > jenkins failure as sleep() time not long enough > {code} > Failing for the past 1 build (Since #4 ) > Took 7.4 sec. > Error Message > expected:<1> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > {code} > assert is after a 3s sleep waiting for reports coming in. > {code} > dataNode.reportBadBlocks(block, dataNode.getFSDataset() > .getFsVolumeReferences().get(0)); > Thread.sleep(3000); // 3s > sleep > BlockManagerTestUtil.updateState(cluster.getNamesystem() > .getBlockManager()); > // Verify the bad block has been reported to namenode > Assert.assertEquals(1, > cluster.getNamesystem().getCorruptReplicaBlocks()); // here > {code} > LambdaTestUtils.eventually() should be used around this assert, maybe with an > even shorter initial delay so on faster systems, test is faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16935) TestFsDatasetImpl.testReportBadBlocks brittle
[ https://issues.apache.org/jira/browse/HDFS-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16935: -- Fix Version/s: 3.4.0 3.3.9 > TestFsDatasetImpl.testReportBadBlocks brittle > - > > Key: HDFS-16935 > URL: https://issues.apache.org/jira/browse/HDFS-16935 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.5, 3.3.9 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > jenkins failure as sleep() time not long enough > {code} > Failing for the past 1 build (Since #4 ) > Took 7.4 sec. > Error Message > expected:<1> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > {code} > assert is after a 3s sleep waiting for reports coming in. > {code} > dataNode.reportBadBlocks(block, dataNode.getFSDataset() > .getFsVolumeReferences().get(0)); > Thread.sleep(3000); // 3s > sleep > BlockManagerTestUtil.updateState(cluster.getNamesystem() > .getBlockManager()); > // Verify the bad block has been reported to namenode > Assert.assertEquals(1, > cluster.getNamesystem().getCorruptReplicaBlocks()); // here > {code} > LambdaTestUtils.eventually() should be used around this assert, maybe with an > even shorter initial delay so on faster systems, test is faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16935) TestFsDatasetImpl.testReportBadBlocks brittle
[ https://issues.apache.org/jira/browse/HDFS-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695266#comment-17695266 ] ASF GitHub Bot commented on HDFS-16935: --- virajjasani commented on code in PR #5432: URL: https://github.com/apache/hadoop/pull/5432#discussion_r1122177909 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java: ## @@ -1101,15 +1099,12 @@ public void testReportBadBlocks() throws Exception { block = DFSTestUtil.getFirstBlock(fs, filePath); // Test for the overloaded method reportBadBlocks - dataNode.reportBadBlocks(block, dataNode.getFSDataset() - .getFsVolumeReferences().get(0)); - Thread.sleep(3000); - BlockManagerTestUtil.updateState(cluster.getNamesystem() - .getBlockManager()); - // Verify the bad block has been reported to namenode - Assert.assertEquals(1, cluster.getNamesystem().getCorruptReplicaBlocks()); -} finally { - cluster.shutdown(); + dataNode.reportBadBlocks(block, dataNode.getFSDataset().getFsVolumeReferences().get(0)); + GenericTestUtils.waitFor(() -> { + BlockManagerTestUtil.updateState(cluster.getNamesystem().getBlockManager()); +// Verify the bad block has been reported to namenode +return 1 == cluster.getNamesystem().getCorruptReplicaBlocks(); + }, 100, 1, "Corrupted replica blocks could not be found"); Review Comment: > I gave it a try and was able to repro this. With triggerHeartbeat, it worked for me. I think that is a standard practice for such cases running since legacy time for such cases(At least my time). Great, sound good. > do you intend to say if we put a sleep just before processQueueMessages, things should screw up? Yes that's what I was suspecting so far but looks like even with sleep, reproducing the screw up is difficult. I tried it now several times but unable to reproduce failure with sleep. Also, now I understand what could still possibly go wrong with the race condition, let me update the patch. > TestFsDatasetImpl.testReportBadBlocks brittle > - > > Key: HDFS-16935 > URL: https://issues.apache.org/jira/browse/HDFS-16935 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.5, 3.3.9 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > > jenkins failure as sleep() time not long enough > {code} > Failing for the past 1 build (Since #4 ) > Took 7.4 sec. > Error Message > expected:<1> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > {code} > assert is after a 3s sleep waiting for reports coming in. > {code} > dataNode.reportBadBlocks(block, dataNode.getFSDataset() > .getFsVolumeReferences().get(0)); > Thread.sleep(3000); // 3s > sleep > BlockManagerTestUtil.updateState(cluster.getNamesystem() > .getBlockManager()); > // Verify the bad block has been reported to namenode > Assert.assertEquals(1, > cluster.getNamesystem().getCorruptReplicaBlocks()); // here > {code} > LambdaTestUtils.eventually() should be used around this assert, maybe with an > even shorter initial delay so on faster systems, test is faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16935) TestFsDatasetImpl.testReportBadBlocks brittle
[ https://issues.apache.org/jira/browse/HDFS-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695264#comment-17695264 ] ASF GitHub Bot commented on HDFS-16935: --- steveloughran merged PR #5432: URL: https://github.com/apache/hadoop/pull/5432 > TestFsDatasetImpl.testReportBadBlocks brittle > - > > Key: HDFS-16935 > URL: https://issues.apache.org/jira/browse/HDFS-16935 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.5, 3.3.9 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > > jenkins failure as sleep() time not long enough > {code} > Failing for the past 1 build (Since #4 ) > Took 7.4 sec. > Error Message > expected:<1> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > {code} > assert is after a 3s sleep waiting for reports coming in. > {code} > dataNode.reportBadBlocks(block, dataNode.getFSDataset() > .getFsVolumeReferences().get(0)); > Thread.sleep(3000); // 3s > sleep > BlockManagerTestUtil.updateState(cluster.getNamesystem() > .getBlockManager()); > // Verify the bad block has been reported to namenode > Assert.assertEquals(1, > cluster.getNamesystem().getCorruptReplicaBlocks()); // here > {code} > LambdaTestUtils.eventually() should be used around this assert, maybe with an > even shorter initial delay so on faster systems, test is faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16934) org.apache.hadoop.hdfs.tools.TestDFSAdmin#testAllDatanodesReconfig regression
[ https://issues.apache.org/jira/browse/HDFS-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695133#comment-17695133 ] ASF GitHub Bot commented on HDFS-16934: --- hadoop-yetus commented on PR #5434: URL: https://github.com/apache/hadoop/pull/5434#issuecomment-1450289619 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 45s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 7s | | trunk passed | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 34s | | trunk passed | | +1 :green_heart: | javadoc | 1m 7s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 29s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 21s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 51s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5434/5/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 212m 14s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5434/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 49s | | The patch does not generate ASF License warnings. | | | | 318m 19s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.ha.TestObserverNode | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5434/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5434 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 17457753e6f3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 45c2a08193920c45906836f965b1f37491ed7fdb | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
[jira] [Commented] (HDFS-16937) Delete RPC should also record number of delete blocks in audit log
[ https://issues.apache.org/jira/browse/HDFS-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694920#comment-17694920 ] ASF GitHub Bot commented on HDFS-16937: --- hfutatzhanghb commented on PR #5442: URL: https://github.com/apache/hadoop/pull/5442#issuecomment-1449538280 > Can not change the operation name like this, Changing audit log output is an incompatible change @ayushtkn , Thanks for your reminding. BTW, what can i do for this, could please give me some advice? > Delete RPC should also record number of delete blocks in audit log > -- > > Key: HDFS-16937 > URL: https://issues.apache.org/jira/browse/HDFS-16937 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Minor > Labels: pull-request-available > > To better trace the jitter caused by delete rpc, we should also record the > number of deleting blocks in audit log. With this information, we can know > which user cause the jitter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org