YCozy created HDFS-15414:
----------------------------

             Summary: java.net.SocketException: Original Exception : 
java.io.IOException: Broken pipe
                 Key: HDFS-15414
                 URL: https://issues.apache.org/jira/browse/HDFS-15414
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.10.0
            Reporter: YCozy


We observed this exception in a DataNode's log while we are not shutting down 
any nodes in the cluster. Specifically, we have a cluster with 3 DataNodes 
(DN1, DN2, DN3) and 2 NameNodes (NN1, NN2). At some point, this exception 
occurs in DN3's log:
{noformat}
2020-06-08 21:53:03,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:9666, 
datanodeUuid=4408ff04-e406-4ccc-bd5c-8516ad57ec21, infoPort=9664, 
infoSecurePort=0, ipcPort=9667, 
storageInfo=lv=-57;cid=CID-c816c4ea-a559-4fd5-9b3a-b5994dc3a5fa;nsid=34747155;c=1591653120007)
 Starting thread to transfer 
BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 to 127.0.0.1:9766
2020-06-08 21:53:03,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:9666, 
datanodeUuid=4408ff04-e406-4ccc-bd5c-8516ad57ec21, infoPort=9664, 
infoSecurePort=0, ipcPort=9667, 
storageInfo=lv=-57;cid=CID-c816c4ea-a559-4fd5-9b3a-b5994dc3a5fa;nsid=34747155;c=1591653120007)
 Starting thread to transfer 
BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 to 127.0.0.1:9766
2020-06-08 21:53:03,381 INFO 
org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: 
Scheduling a check for /app/dn3/current
2020-06-08 21:53:03,383 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:9666, 
datanodeUuid=4408ff04-e406-4ccc-bd5c-8516ad57ec21, infoPort=9664, 
infoSecurePort=0, ipcPort=9667, 
storageInfo=lv=-57;cid=CID-c816c4ea-a559-4fd5-9b3a-b5994dc3a5fa;nsid=34747155;c=1591653120007):Failed
 to transfer BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 to 
127.0.0.1:9766 got
java.net.SocketException: Original Exception : java.io.IOException: Broken pipe
    at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
    at 
sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
    at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
    at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:605)
    at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
 
    at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:280)
 
    at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:620)
 
    at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:804)
 
    at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:751)
 
    at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2469)
 
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Broken pipe 
    ... 11 more{noformat}
Port 9766 is DN2's address. 

Around the same time, we observe the following exceptions in DN2's log:
{noformat}
2020-06-08 21:53:03,379 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 src: 
/127.0.0.1:47618 dest: /127.0.0.1:9766
2020-06-08 21:53:03,379 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
opWriteBlock BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 received 
exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: 
Block BP-553302063-172.17.0.3-    1591653120007:blk_1073741825_1002 already 
exists in state FINALIZED and thus cannot be created.
2020-06-08 21:53:03,379 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
007e9b383989:9766:DataXceiver error processing WRITE_BLOCK operation  src: 
/127.0.0.1:47618 dst: /127.0.0.1:9766; 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block     
BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 already exists in 
state FINALIZED and thus cannot be created.{noformat}
However, this exception does look like the cause of the broken pipe because 
earlier DN2 has another occurrence of a ReplicaAlreadyExistsException, but DN3 
only has one occurrence of broken pipe. Here's the other occurrence of 
ReplicaAlreadyExistsException on DN2:
{noformat}
2020-06-08 21:52:54,438 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1001 src: 
/127.0.0.1:47462 dest: /127.0.0.1:9766
2020-06-08 21:52:54,438 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
opWriteBlock BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1001 received 
exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: 
Block BP-553302063-172.17.0.3-    1591653120007:blk_1073741825_1001 already 
exists in state FINALIZED and thus cannot be created.
2020-06-08 21:52:54,448 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
007e9b383989:9766:DataXceiver error processing WRITE_BLOCK operation  src: 
/127.0.0.1:47462 dst: /127.0.0.1:9766; 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block     
BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1001 already exists in 
state FINALIZED and thus cannot be created.{noformat}
So we think there is a bug causing the broken pipe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to