[
https://issues.apache.org/jira/browse/HDFS-17806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004410#comment-18004410
]
ASF GitHub Bot commented on HDFS-17806:
---------------------------------------
LiuGuH opened a new pull request, #7793:
URL: https://github.com/apache/hadoop/pull/7793
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
When getFileChecksum with ec file , Datanode will throw
SocketTimeoutException when load is high.
`
java.net.SocketTimeoutException: 3000 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/****:48666 remote=/****:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:163)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at
org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:520)
at
org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.checksumBlock(BlockChecksumHelper.java:632)
at
org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:493)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1107)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:336)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:123)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:314)
at java.lang.Thread.run(Thread.java:748)
`
<img width="2758" height="1830" alt="ec checksum timeout"
src="https://github.com/user-attachments/assets/056ab0f5-e65c-44de-bbeb-326cc4066950"
/>
EC file checksum will serial execution to connect datanode with dataunit
block for checksum. And 3000 millis is too short for high load data. We
should increase EC socket timeout.
> Increase dfs.checksum.ec.socket-timeout
> ----------------------------------------
>
> Key: HDFS-17806
> URL: https://issues.apache.org/jira/browse/HDFS-17806
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: liuguanghua
> Assignee: liuguanghua
> Priority: Minor
> Attachments: ec checksum timeout.png
>
>
> java.net.SocketTimeoutException: 3000 millis timeout while waiting for
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
> local=/****:48666 remote=/****:50010]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:163)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:520)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.checksumBlock(BlockChecksumHelper.java:632)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:493)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1107)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:336)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:123)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:314)
> at java.lang.Thread.run(Thread.java:748)
>
> When getFileChecksum with ec file , Datanode will throw
> SocketTimeoutException when load is high. We should increase EC socket
> timeout
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]