I have 2 clusters:
30 nodes running 0.18.3
and
36 nodes running 0.20.1

I've intermittently seen the following errors on both of my clusters, it happens when writing files. I was hoping this would go away with the new version but I see the same behavior on both versions. The namenode logs don't show any problems, its always on the client and datanodes.

Below is any example from this morning, unfortunately I haven't found a bug or config that specifically addresses this issue.

Any insight would be greatly appreciated.

Client log:
09/11/25 10:54:15 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.1.75.11:37852 remote=/10.1.75.125:50010] 09/11/25 10:54:15 INFO hdfs.DFSClient: Abandoning block blk_-105422935413230449_22608 09/11/25 10:54:15 INFO hdfs.DFSClient: Waiting to find target node: 10.1.75.125:50010

Datanode log:
2009-11-25 10:54:51,170 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.75.125:50010, storageID=DS-1401408597-10.1.75.125-50010-1258737830230, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.1.75.104:50010] at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
       at java.lang.Thread.run(Thread.java:619)

Reply via email to