I have 2 clusters:
30 nodes running 0.18.3
and
36 nodes running 0.20.1
I've intermittently seen the following errors on both of my clusters, it
happens when writing files.
I was hoping this would go away with the new version but I see the same
behavior on both versions.
The namenode logs don't show any problems, its always on the client and
datanodes.
Below is any example from this morning, unfortunately I haven't found a
bug or config that specifically addresses this issue.
Any insight would be greatly appreciated.
Client log:
09/11/25 10:54:15 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketTimeoutException: 69000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.1.75.11:37852
remote=/10.1.75.125:50010]
09/11/25 10:54:15 INFO hdfs.DFSClient: Abandoning block
blk_-105422935413230449_22608
09/11/25 10:54:15 INFO hdfs.DFSClient: Waiting to find target node:
10.1.75.125:50010
Datanode log:
2009-11-25 10:54:51,170 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.1.75.125:50010,
storageID=DS-1401408597-10.1.75.125-50010-1258737830230, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 120000 millis timeout while waiting for
channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending
remote=/10.1.75.104:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)