[ https://issues.apache.org/jira/browse/HDFS-17806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004410#comment-18004410 ]
ASF GitHub Bot commented on HDFS-17806: --------------------------------------- LiuGuH opened a new pull request, #7793: URL: https://github.com/apache/hadoop/pull/7793 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR When getFileChecksum with ec file , Datanode will throw SocketTimeoutException when load is high. ` java.net.SocketTimeoutException: 3000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/****:48666 remote=/****:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:163) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:520) at org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.checksumBlock(BlockChecksumHelper.java:632) at org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1107) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:336) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:123) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:314) at java.lang.Thread.run(Thread.java:748) ` <img width="2758" height="1830" alt="ec checksum timeout" src="https://github.com/user-attachments/assets/056ab0f5-e65c-44de-bbeb-326cc4066950" /> EC file checksum will serial execution to connect datanode with dataunit block for checksum. And 3000 millis is too short for high load data. We should increase EC socket timeout. > Increase dfs.checksum.ec.socket-timeout > ---------------------------------------- > > Key: HDFS-17806 > URL: https://issues.apache.org/jira/browse/HDFS-17806 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: liuguanghua > Assignee: liuguanghua > Priority: Minor > Attachments: ec checksum timeout.png > > > java.net.SocketTimeoutException: 3000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/****:48666 remote=/****:50010] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:163) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:520) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.checksumBlock(BlockChecksumHelper.java:632) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1107) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:336) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:123) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:314) > at java.lang.Thread.run(Thread.java:748) > > When getFileChecksum with ec file , Datanode will throw > SocketTimeoutException when load is high. We should increase EC socket > timeout > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org