[ 
https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua updated HDFS-17357:
-------------------------------
    Description: 
NioInetPeer.close()  now do not close socket connection.  

 

In my environment,all data were stored with EC.

And I found 3w+ connections leakage in datanode . And I found many warn message 
as blew.

2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
hostname:50010:DataXceiverServer

 

When any Exception is found in DataXceiverServer, it will execute clostStream.

IOUtils.closeStream(peer)    -> Peer.close() -> NioInetPeer.close() 

But NioInetPeer.close()  is not invoked with  close socket connection. And this 
will lead to connection leakage.

Other subClass of Peer's close() is implemented with socket.close().  See 

EncryptedPeer, DomainPeer, BasicInetPeer

 

 

This solution can be reporduced as following:
(1) Client write data to HDFS
(2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the 
new Xceiver will fail and throw IOException . And the socket will not release.
(3) Client crash for that no new data will be added or client.close is executed.
(4) There will be socket connection leakage between datanodes.

 

 

The connection leakage like this
dn1
dn1:57042 dn2:50010 ESTABLISHED

dn2
dn2:50010 dn1:57042 ESTABLISHED

  was:
NioInetPeer.close()  now do not close socket connection.  

 

In my environment,all data were stored with EC.

And I found 3w+ connections leakage in datanode . And I found many warn message 
as blew.

2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
hostname:50010:DataXceiverServer

 

When any Exception is found in DataXceiverServer, it will execute clostStream.

IOUtils.closeStream(peer)    -> Peer.close() -> NioInetPeer.close() 

But NioInetPeer.close()  is not invoked with  close socket connection. And this 
will lead to connection leakage.

Other subClass of Peer's close() is implemented with socket.close().  See 

EncryptedPeer, DomainPeer, BasicInetPeer

 

 

This solution can be reporduced as following:
(1) Client write data to HDFS
(2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the 
new Xceiver will fail and throw IOException . And the socket will not release.
(3) Client crash for that no new data will be added or client.close is executed.
(4) There will be socket connection leakage between datanodes.


> NioInetPeer.close() should close socket connection.
> ---------------------------------------------------
>
>                 Key: HDFS-17357
>                 URL: https://issues.apache.org/jira/browse/HDFS-17357
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: liuguanghua
>            Assignee: liuguanghua
>            Priority: Major
>              Labels: pull-request-available
>
> NioInetPeer.close()  now do not close socket connection.  
>  
> In my environment,all data were stored with EC.
> And I found 3w+ connections leakage in datanode . And I found many warn 
> message as blew.
> 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hostname:50010:DataXceiverServer
>  
> When any Exception is found in DataXceiverServer, it will execute clostStream.
> IOUtils.closeStream(peer)    -> Peer.close() -> NioInetPeer.close() 
> But NioInetPeer.close()  is not invoked with  close socket connection. And 
> this will lead to connection leakage.
> Other subClass of Peer's close() is implemented with socket.close().  See 
> EncryptedPeer, DomainPeer, BasicInetPeer
>  
>  
> This solution can be reporduced as following:
> (1) Client write data to HDFS
> (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the 
> new Xceiver will fail and throw IOException . And the socket will not release.
> (3) Client crash for that no new data will be added or client.close is 
> executed.
> (4) There will be socket connection leakage between datanodes.
>  
>  
> The connection leakage like this
> dn1
> dn1:57042 dn2:50010 ESTABLISHED
> dn2
> dn2:50010 dn1:57042 ESTABLISHED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to