[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695400#comment-17695400 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1451124197 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 25s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 38s | | trunk passed | | +1 :green_heart: | compile | 6m 1s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 39s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 18s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 52s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 17s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 26s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 5s | | the patch passed | | +1 :green_heart: | compile | 5m 55s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 55s | | the patch passed | | +1 :green_heart: | compile | 5m 36s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 6s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/10/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 41 unchanged - 1 fixed = 42 total (was 42) | | +1 :green_heart: | mvnsite | 2m 9s | | the patch passed | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 56s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 27s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 203m 50s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 50s | | The patch does not generate ASF License warnings. | | | | 344m 52s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux a927011dee89 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 40b77b2a52ef6faabcd3190738ec01418a6ca550 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695393#comment-17695393 ] ASF GitHub Bot commented on HDFS-16896: --- omalley merged PR #5444: URL: https://github.com/apache/hadoop/pull/5444 > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695305#comment-17695305 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5444: URL: https://github.com/apache/hadoop/pull/5444#issuecomment-1450786773 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | docker | 5m 55s | | Docker failed to build run-specific yetus/hadoop:tp-9052}. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/5444 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5444/1/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695301#comment-17695301 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 opened a new pull request, #5444: URL: https://github.com/apache/hadoop/pull/5444 …… (#5322) HDFS-16896 clear ignoredNodes list when we clear deadnode list on refetchLocations. ignoredNodes list is only used on hedged read codepath ### Description of PR Backporting hedged read fixes to branch 3.3 ### How was this patch tested? Added tests and tested by LinkedIn Trino to verify performance improvements ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695290#comment-17695290 ] ASF GitHub Bot commented on HDFS-16896: --- omalley merged PR #5322: URL: https://github.com/apache/hadoop/pull/5322 > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695277#comment-17695277 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1122202223 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -955,6 +965,10 @@ private DNAddrPair chooseDataNode(LocatedBlock block, } } + /** + * RefetchLocations should only be called when there are no active requests + * to datanodes. In the hedged read case this means futures should be empty + */ Review Comment: Added > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695276#comment-17695276 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1122201971 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: fixed > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694765#comment-17694765 ] ASF GitHub Bot commented on HDFS-16896: --- omalley commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1120765825 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -955,6 +965,10 @@ private DNAddrPair chooseDataNode(LocatedBlock block, } } + /** + * RefetchLocations should only be called when there are no active requests + * to datanodes. In the hedged read case this means futures should be empty + */ Review Comment: Thanks for adding the javadoc, but you should document the parameters too. (Especially the modification of ignoredNodes.) ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: If you are going to do that, I'd just expand the clearIgnoredNodes into it. Do make sure to call out in the javadoc that you are clearing the passed in list. > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694224#comment-17694224 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1119427557 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: sounds good, to be clear this is what im planning ``` private void clearCachedNodeState(Collection ignoredNodes) { clearLocalDeadNodes(); //2nd option is to remove only nodes[blockId] clearIgnoredNodes(ignoredNodes); } ``` > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694222#comment-17694222 ] ASF GitHub Bot commented on HDFS-16896: --- mkuchenbecker commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1119426257 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: I'd personally err on the side of "slightly confusing name with documentation but ensure it always happens." `clearCachedNodeState`? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694215#comment-17694215 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1119413903 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -1337,8 +1352,12 @@ private void hedgedFetchBlockByteRange(LocatedBlock block, long start, } catch (InterruptedException ie) { // Ignore and retry } -if (refetch) { - refetchLocations(block, ignored); +// If refetch is true, then all nodes are in deadNodes or ignoredNodes. +// We should loop through all futures and remove them, so we do not +// have concurrent requests to the same node. +// Once all futures are cleared, we can clear the ignoredNodes and retry. Review Comment: yes, the thing i am trying to emphasize is the `&& futures.isEmpty()` check which is specific to how ignored nodes is cleared > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694216#comment-17694216 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1119414105 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -224,7 +233,7 @@ boolean deadNodesContain(DatanodeInfo nodeInfo) { } /** - * Grab the open-file info from namenode + * Grab the open-file info from namenode. Review Comment: it came up in checkstyle, it siad i added one new checkstyle, so i just fixed it > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694214#comment-17694214 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1119413264 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: I could add a `clearSkippedNodes` and then clear both dead and ignored in there, but that might be confusing as it sounds like theres another node type/list. I don't think `clearLocalDeadNodes` should also clear ignoredNodes because thats a bit miselading. The deadnodes and ignore nodes are handled differently, so i don't think its crazy to keep them separate and clear. (like i said before) i would add a method that clears both dead and ignore, but concerned it may be confusing. lmk what you think > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694212#comment-17694212 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1119409826 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java: ## @@ -603,7 +603,9 @@ public Void answer(InvocationOnMock invocation) throws Throwable { input.read(0, buffer, 0, 1024); Assert.fail("Reading the block should have thrown BlockMissingException"); } catch (BlockMissingException e) { - assertEquals(3, input.getHedgedReadOpsLoopNumForTesting()); + // The result of 9 is due to 2 blocks by 4 iterations plus one because + // hedgedReadOpsLoopNumForTesting is incremented at start of the loop. + assertEquals(9, input.getHedgedReadOpsLoopNumForTesting()); Review Comment: we are actually 4x'ing, the comment was meant to help clarify. If you recall the issue was we only previously tried each block once, the change is to make hedged reads follow the same number of retires as non hedged reads which has 3 retry loops. This example has 2 blocks, and the last loop is when it exists. Previously 2+1 and now 8+1 > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692337#comment-17692337 ] ASF GitHub Bot commented on HDFS-16896: --- mkuchenbecker commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1114855356 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: Nit. Param documentation. ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -1337,8 +1352,12 @@ private void hedgedFetchBlockByteRange(LocatedBlock block, long start, } catch (InterruptedException ie) { // Ignore and retry } -if (refetch) { - refetchLocations(block, ignored); +// If refetch is true, then all nodes are in deadNodes or ignoredNodes. +// We should loop through all futures and remove them, so we do not +// have concurrent requests to the same node. +// Once all futures are cleared, we can clear the ignoredNodes and retry. Review Comment: Ignored and dead nodes are cleared, correct? ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -224,7 +233,7 @@ boolean deadNodesContain(DatanodeInfo nodeInfo) { } /** - * Grab the open-file info from namenode + * Grab the open-file info from namenode. Review Comment: Is this change needed? ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -197,6 +197,15 @@ private void clearLocalDeadNodes() { deadNodes.clear(); } + /** + * Clear list of ignored nodes used for hedged reads. + */ + private void clearIgnoredNodes(Collection ignoredNodes) { Review Comment: Does it make more sense to clean up ignored / dead in the same function so we don't need to worry about calling both? ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java: ## @@ -603,7 +603,9 @@ public Void answer(InvocationOnMock invocation) throws Throwable { input.read(0, buffer, 0, 1024); Assert.fail("Reading the block should have thrown BlockMissingException"); } catch (BlockMissingException e) { - assertEquals(3, input.getHedgedReadOpsLoopNumForTesting()); + // The result of 9 is due to 2 blocks by 4 iterations plus one because + // hedgedReadOpsLoopNumForTesting is incremented at start of the loop. + assertEquals(9, input.getHedgedReadOpsLoopNumForTesting()); Review Comment: We are tripling the IO per hedged request? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at >
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690278#comment-17690278 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1434353220 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 24s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 34m 13s | | trunk passed | | +1 :green_heart: | compile | 6m 42s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 6m 17s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 18s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 24s | | trunk passed | | +1 :green_heart: | javadoc | 1m 46s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 9s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 7s | | trunk passed | | +1 :green_heart: | shadedclient | 29m 2s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 24s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 23s | | the patch passed | | +1 :green_heart: | compile | 6m 34s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 6m 34s | | the patch passed | | +1 :green_heart: | compile | 6m 12s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 6m 12s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | hadoop-hdfs-project: The patch generated 0 new + 41 unchanged - 1 fixed = 41 total (was 42) | | +1 :green_heart: | mvnsite | 2m 13s | | the patch passed | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 56s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 11s | | the patch passed | | +1 :green_heart: | shadedclient | 29m 4s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 20s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 228m 53s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 43s | | The patch does not generate ASF License warnings. | | | | 393m 47s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.ha.TestObserverNode | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux cac80c5a2522 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 17:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / c073c9f1dc85b4398f36647d4d64219db2c4f66f | | Default Java | Private
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690152#comment-17690152 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1434122848 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 18s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 33m 57s | | trunk passed | | +1 :green_heart: | compile | 7m 0s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 6m 31s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 26s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 35s | | trunk passed | | +1 :green_heart: | javadoc | 1m 58s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 17s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 34s | | trunk passed | | +1 :green_heart: | shadedclient | 28m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 43s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 25s | | the patch passed | | +1 :green_heart: | compile | 5m 53s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 53s | | the patch passed | | +1 :green_heart: | compile | 5m 41s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 41s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 6s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/8/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 42 unchanged - 0 fixed = 43 total (was 42) | | +1 :green_heart: | mvnsite | 2m 15s | | the patch passed | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 0s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 20s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 26s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 204m 16s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 366m 39s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux b5a1be0c0e37 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh |
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690148#comment-17690148 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1434115847 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 31s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 31m 12s | | trunk passed | | +1 :green_heart: | compile | 6m 14s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 41s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 21s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 49s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 18s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 47s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 20s | | the patch passed | | +1 :green_heart: | compile | 5m 54s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 54s | | the patch passed | | +1 :green_heart: | compile | 5m 37s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 37s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 4s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/7/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 2 new + 42 unchanged - 0 fixed = 44 total (was 42) | | +1 :green_heart: | mvnsite | 2m 14s | | the patch passed | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 1s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 46s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 29s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 207m 40s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 361m 50s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | | hadoop.hdfs.server.namenode.TestFsck | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 41ce24a7c7bf 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh |
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690075#comment-17690075 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1433945193 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 23m 40s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 33m 28s | | trunk passed | | +1 :green_heart: | compile | 6m 6s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 55s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 50s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 16s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 53s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 51s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 20s | | the patch passed | | +1 :green_heart: | compile | 6m 4s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 6m 4s | | the patch passed | | +1 :green_heart: | compile | 5m 38s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 38s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 6s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/6/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 2 new + 42 unchanged - 0 fixed = 44 total (was 42) | | +1 :green_heart: | mvnsite | 2m 14s | | the patch passed | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 57s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 37s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 28s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 205m 29s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 369m 46s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestPread | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d9582a898494 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689611#comment-17689611 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1432703908 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 38m 29s | | trunk passed | | +1 :green_heart: | compile | 6m 43s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 6m 12s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 18s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 26s | | trunk passed | | +1 :green_heart: | javadoc | 1m 47s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 4s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 28m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 22s | | the patch passed | | +1 :green_heart: | compile | 6m 32s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 6m 32s | | the patch passed | | +1 :green_heart: | compile | 6m 26s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 6m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 12s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) | | +1 :green_heart: | mvnsite | 2m 17s | | the patch passed | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 0s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 52s | | the patch passed | | +1 :green_heart: | shadedclient | 30m 40s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 20s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 241m 59s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 413m 31s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.TestPread | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 62ba32aa466f 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 17:23:37 UTC 2022 x86_64 x86_64
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689437#comment-17689437 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1107919545 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -1337,7 +1352,11 @@ private void hedgedFetchBlockByteRange(LocatedBlock block, long start, } catch (InterruptedException ie) { // Ignore and retry } -if (refetch) { +// if refetch is true then all nodes are in deadlist or ignorelist +// we should loop through all futures and remove them so we do not Review Comment: fixed comments. deadlist is actually deadNodes (I fixed that comment as well.) When connections fail (in both hedged and non hedged code path) nodes are added to the deadNodes collection to try other nodes. Once `chooseDataNode` returns `null` (or more accurately `getBestNodeDNAddrPair`) it calls `refetchLocations` which clears the deadNodes `clearLocalDeadNodes()` and now with my change, also clears the ignore list. Note we have added an assumption to this method `refetchLocations`. The comment I added to `refetchLocations` ``` /** * RefetchLocations should only be called when there are no active requests * to datanodes. In the hedged read case this means futures should be empty */ ``` > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689348#comment-17689348 ] ASF GitHub Bot commented on HDFS-16896: --- simbadzina commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1107745069 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -1337,7 +1352,11 @@ private void hedgedFetchBlockByteRange(LocatedBlock block, long start, } catch (InterruptedException ie) { // Ignore and retry } -if (refetch) { +// if refetch is true then all nodes are in deadlist or ignorelist +// we should loop through all futures and remove them so we do not Review Comment: Could you add punctuation here and start new sentences with caps. That will make the comment easier to follow. What is the deadlist? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689347#comment-17689347 ] ASF GitHub Bot commented on HDFS-16896: --- simbadzina commented on code in PR #5322: URL: https://github.com/apache/hadoop/pull/5322#discussion_r1107745069 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java: ## @@ -1337,7 +1352,11 @@ private void hedgedFetchBlockByteRange(LocatedBlock block, long start, } catch (InterruptedException ie) { // Ignore and retry } -if (refetch) { +// if refetch is true then all nodes are in deadlist or ignorelist +// we should loop through all futures and remove them so we do not Review Comment: Could you add punctuation here and start new sentences with caps. It was hard to read the flow of the comment. What is the deadlist? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688828#comment-17688828 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1430775324 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 25s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 31m 2s | | trunk passed | | +1 :green_heart: | compile | 6m 12s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 46s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 21s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 31s | | trunk passed | | +1 :green_heart: | javadoc | 1m 51s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 18s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 55s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 41s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 21s | | the patch passed | | +1 :green_heart: | compile | 5m 52s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 52s | | the patch passed | | +1 :green_heart: | compile | 5m 37s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 37s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 4s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/4/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) | | +1 :green_heart: | mvnsite | 2m 13s | | the patch passed | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 3s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 54s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 26s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 206m 23s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 360m 1s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.TestPread | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c1560d884f2a 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688746#comment-17688746 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1430524690 > The change generally looks good to me. > > One concern is that when `refetchLocation` is called inside `hedgedFetchBlockByteRange` we could remove a node from the ignoredList that is already part of the futures array. This would lead to multiple reads to the same node. > > What I'm thinking of is > > 1. Node A is added to futures array > 2. getFirstToComplete throws an InterruptedException when doing hedgedService.take() > 3. We call retchLocation, which remove node A from ignored list. > 4. The while loop re-adds Node A to the futures list. > > I don't know if this actually can happen. Even if it is technically possible, it may not be an issue. Thoughts? yes @simbadzina I agree this was an issue. I've resolved this now. As you pointed out the future is removed in `getFirstToComplete`, so now we let it spin in the while loop, each time a future will be removed, and then once futures is empty and refetch is needed we will clear the ignore list > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683985#comment-17683985 ] ASF GitHub Bot commented on HDFS-16896: --- simbadzina commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1416241192 I ran each of the failing unit test classes in Intellij individually. > [2023-01-26T21:52:13.365Z] Reason | Tests [2023-01-26T21:52:13.365Z] Failed junit tests | hadoop.hdfs.server.datanode.TestDiskError [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.namenode.TestNameNodeReconfigure [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.namenode.TestReencryption [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.datanode.TestDataNodeMetricsLogger [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.datanode.TestDataNodeReconfiguration [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.namenode.TestNameNodeMXBean [2023-01-26T21:52:13.365Z] | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes They all passed with your PR. So the failures are just test flakiness. > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > Labels: pull-request-available > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17681135#comment-17681135 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1405715222 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 31m 1s | | trunk passed | | +1 :green_heart: | compile | 6m 2s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 47s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 50s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 18s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 49s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 20s | | the patch passed | | +1 :green_heart: | compile | 5m 57s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 57s | | the patch passed | | +1 :green_heart: | compile | 5m 36s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 6s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) | | +1 :green_heart: | mvnsite | 2m 15s | | the patch passed | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 0s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 45s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 37s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 25s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 125m 27s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 4s | | The patch does not generate ASF License warnings. | | | | 278m 17s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDiskError | | | hadoop.hdfs.server.namenode.TestNameNodeReconfigure | | | hadoop.hdfs.server.namenode.TestReencryption | | | hadoop.hdfs.server.datanode.TestDataNodeMetricsLogger | | | hadoop.hdfs.server.datanode.TestDataNodeReconfiguration | | | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | | | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/3/artifact/out/Dockerfile | | GITHUB PR |
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680922#comment-17680922 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1404685864 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 44s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 48s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 31m 6s | | trunk passed | | +1 :green_heart: | compile | 6m 1s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 5m 40s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 21s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 52s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 11s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 53s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 28s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 23s | | the patch passed | | +1 :green_heart: | compile | 5m 48s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 5m 48s | | the patch passed | | +1 :green_heart: | compile | 5m 38s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 38s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/2/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 1m 2s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/2/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) | | +1 :green_heart: | mvnsite | 2m 12s | | the patch passed | | +1 :green_heart: | javadoc | 1m 24s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 2m 2s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 27s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 277m 38s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 52s | | The patch does not generate ASF License warnings. | | | | 430m 9s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestPread | | | hadoop.hdfs.server.balancer.TestBalancerService | | | hadoop.hdfs.TestLeaseRecovery2 | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux bb9d429f56ed 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680153#comment-17680153 ] ASF GitHub Bot commented on HDFS-16896: --- hadoop-yetus commented on PR #5322: URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1401628815 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 41s | | trunk passed | | +1 :green_heart: | compile | 1m 2s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 0m 56s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 37s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 1s | | trunk passed | | +1 :green_heart: | javadoc | 0m 51s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 0m 41s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 39s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 5s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 57s | | the patch passed | | +1 :green_heart: | compile | 0m 54s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 0m 54s | | the patch passed | | +1 :green_heart: | compile | 0m 46s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 46s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 20s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt) | hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) | | +1 :green_heart: | mvnsite | 0m 50s | | the patch passed | | +1 :green_heart: | javadoc | 0m 34s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 0m 32s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 32s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 44s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 23s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 111m 50s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5322 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 41d886b5d013 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5a59467e1f671e78c9656b6997c5b350a4faa93d | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/1/testReport/ | | Max. process+thread count | 698 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680119#comment-17680119 ] ASF GitHub Bot commented on HDFS-16896: --- mccormickt12 opened a new pull request, #5322: URL: https://github.com/apache/hadoop/pull/5322 …etchLocations. ignoredNodes list is only used on hedged read codepath ### Description of PR clear ignoredNodes list when we clear deadnode list on refetchLocations. ignoredNodes list is only used on hedged read codepath ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read
[ https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680116#comment-17680116 ] Tom McCormick commented on HDFS-16896: -- Our current theory Hedged Read has a different code path than default functionality (regardless of if a hedged read is ever actually invoked). There is an ignoreList that is used to keep track of which node has been tried so the hedged read doesn’t try the same node, but that list is never cleared. The default code path has a failure loop of 3 times (after each failure, all 3 blocks should be tried again), resulting in 12 block read attempts. In the hedged read case, all nodes are added to the ignoreList that is never cleared, resulting in a total of 3 block read attempts. > HDFS Client hedged read has increased failure rate than without hedged read > --- > > Key: HDFS-16896 > URL: https://issues.apache.org/jira/browse/HDFS-16896 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Tom McCormick >Assignee: Tom McCormick >Priority: Major > > When hedged read is enabled by HDFS client, we see an increased failure rate > on reads. > *stacktrace* > > {code:java} > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 > file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039) > at > org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172) > at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137) > at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) > at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136) > at > org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at > io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76) > ... 46 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org