[
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987204#comment-17987204
]
ASF GitHub Bot commented on HDFS-15413:
---------------------------------------
hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-3023942776
:confetti_ball: **+1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 54s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +0 :ok: | xmllint | 0m 0s | | xmllint was not available. |
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 1 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +0 :ok: | mvndep | 5m 51s | | Maven dependency ordering for branch |
| +1 :green_heart: | mvninstall | 38m 59s | | trunk passed |
| +1 :green_heart: | compile | 6m 30s | | trunk passed with JDK
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | compile | 5m 37s | | trunk passed with JDK
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
| +1 :green_heart: | checkstyle | 1m 29s | | trunk passed |
| +1 :green_heart: | mvnsite | 2m 15s | | trunk passed |
| +1 :green_heart: | javadoc | 2m 2s | | trunk passed with JDK
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 2m 22s | | trunk passed with JDK
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
| +1 :green_heart: | spotbugs | 5m 44s | | trunk passed |
| +1 :green_heart: | shadedclient | 43m 35s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +0 :ok: | mvndep | 0m 34s | | Maven dependency ordering for patch |
| +1 :green_heart: | mvninstall | 1m 52s | | the patch passed |
| +1 :green_heart: | compile | 6m 23s | | the patch passed with JDK
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javac | 6m 23s | | the patch passed |
| +1 :green_heart: | compile | 5m 25s | | the patch passed with JDK
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
| +1 :green_heart: | javac | 5m 25s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| -0 :warning: | checkstyle | 1m 19s |
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/12/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
| hadoop-hdfs-project: The patch generated 1 new + 114 unchanged - 0 fixed =
115 total (was 114) |
| +1 :green_heart: | mvnsite | 1m 59s | | the patch passed |
| +1 :green_heart: | javadoc | 1m 43s | | the patch passed with JDK
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 2m 10s | | the patch passed with JDK
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
| +1 :green_heart: | spotbugs | 5m 49s | | the patch passed |
| +1 :green_heart: | shadedclient | 44m 8s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 2m 39s | | hadoop-hdfs-client in the patch
passed. |
| +1 :green_heart: | unit | 85m 53s | | hadoop-hdfs in the patch
passed. |
| +1 :green_heart: | asflicense | 0m 45s | | The patch does not
generate ASF License warnings. |
| | | 275m 3s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.51 ServerAPI=1.51 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/12/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/5829 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
| uname | Linux 0db4d4772921 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15
15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 1a28725cb6bbb6d5a3b1f4ad5ee4fbe8e2540f33 |
| Default Java | Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/12/testReport/ |
| Max. process+thread count | 1836 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/12/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> DFSStripedInputStream throws exception when datanodes close idle connections
> ----------------------------------------------------------------------------
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec, erasure-coding, hdfs-client
> Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 10000
> - dfs.datanode.socket.write.timeout = 10000
> Reporter: Andrey Elenskiy
> Priority: Critical
> Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding
> is enabled on a table directory. After digging further I was able to narrow
> it down to a seek + read logic and able to reproduce the issue with hdfs
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
> while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
> int nread = istream.read(buf, 0, bufLen);
> if (nread < 0) {
> throw new Exception("nread is less than zero");
> }
> readTotal += nread;
> bufOffset += nread;
> bytesRemaining -= nread;
> }
> count++;
> if (count == countBeforeSleep) {
> System.out.println("sleeping for " + sleepDuration + "
> milliseconds");
> Thread.sleep(sleepDuration);
> System.out.println("resuming");
> }
> if (count == countBeforeSleep + countAfterSleep) {
> System.out.println("done");
> break;
> }
> }
> } catch (Exception e) {
> System.out.println("exception on read " + count + " read total "
> + readTotal);
> throw e;
> }
> }
> }
> {code}
> The issue appears to be due to the fact that datanodes close the connection
> of EC client if it doesn't fetch next packet for longer than
> dfs.client.socket-timeout. The EC client doesn't retry and instead assumes
> that those datanodes went away resulting in "missing blocks" exception.
> I was able to consistently reproduce with the following arguments:
> {noformat}
> bufLen = 1000000 (just below 1MB which is the size of the stripe)
> sleepDuration = (dfs.client.socket-timeout + 1) * 1000 (in our case 11000)
> countBeforeSleep = 1
> countAfterSleep = 7
> {noformat}
> I've attached the entire log output of running the snippet above against
> erasure coded file with RS-3-2-1024k policy. And here are the logs from
> datanodes of disconnecting the client:
> datanode 1:
> {noformat}
> 2020-06-15 19:06:20,697 INFO datanode.DataNode: Likely the client has stopped
> reading, disconnecting it (datanode-v11-0-hadoop.hadoop:9866:DataXceiver
> error processing READ_BLOCK operation src: /10.128.23.40:53748 dst:
> /10.128.14.46:9866); java.net.SocketTimeoutException: 10000 millis timeout
> while waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.128.14.46:9866
> remote=/10.128.23.40:53748]
> {noformat}
> datanode 2:
> {noformat}
> 2020-06-15 19:06:20,341 INFO datanode.DataNode: Likely the client has stopped
> reading, disconnecting it (datanode-v11-1-hadoop.hadoop:9866:DataXceiver
> error processing READ_BLOCK operation src: /10.128.23.40:48772 dst:
> /10.128.9.42:9866); java.net.SocketTimeoutException: 10000 millis timeout
> while waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.128.9.42:9866
> remote=/10.128.23.40:48772]
> {noformat}
> datanode 3:
> {noformat}
> 2020-06-15 19:06:20,467 INFO datanode.DataNode: Likely the client has stopped
> reading, disconnecting it (datanode-v11-3-hadoop.hadoop:9866:DataXceiver
> error processing READ_BLOCK operation src: /10.128.23.40:57184 dst:
> /10.128.16.13:9866); java.net.SocketTimeoutException: 10000 millis timeout
> while waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.128.16.13:9866
> remote=/10.128.23.40:57184]
> {noformat}
> I've tried running the same code again non-ec files with replication of 3 and
> was not able to reproduce the issue with any parameters. Looking through the
> code, it's pretty clear that non-ec DFSInputStream retries reads after
> exception:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L844
> Let me know if you need any more information that can help you out with
> addressing this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]