[ 
https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102971#comment-15102971
 ] 

Ashish Singhi commented on HBASE-9393:
--------------------------------------

bq. The timeout that I'm talking about is inside DFSClient.java, not inside 
HBase. HDFS-4911 fixed a problem where the timeout was too long.
I have experimented with all those configurations but the thing to note here is 
HBase is not closing the stream, so how will the socket will be closed.

bq. Can you be a little bit clearer on what you'd like to implement, and what 
you see as the problem here?
Below is the brief idea what I would like to implement,
HBase will have a periodic thread monitoring these streams. When a stream is 
idle for more than configurable time, crossed the configurable limit on the 
maximum number of streams that can be kept open and has 0 references (like when 
HFile#pickReaderVersion is called I will increment the reference count and at 
the end I will decrement it, as after that this stream is no longer used in the 
same flow) to it then this thread will close that stream.
The above implementation will be configurable and by default disabled as we are 
expecting some impact on read flow.

> Hbase does not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.98.0
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 
> 7279 regions
>            Reporter: Avi Zrachya
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect 
> to the datanode because too many mapped sockets from one host to another on 
> the same port.
> The example below is with low CLOSE_WAIT count because we had to restart 
> hbase to solve the porblem, later in time it will incease to 60-100K sockets 
> on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 
> /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m 
> -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dhbase.log.dir=/var/log/hbase 
> -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to