[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551059 ]
Raghu Angadi commented on HADOOP-2341: -------------------------------------- >Studying these CLOSE_WAITs over last few days, the client buffer shows 1 or 0 >bytes in the queue. At a minimum, I would expect that when client has read all >of a block - netstat shows queues of size 0 - then the client should close its >socket and free up datanode-side resources. This is ok. There are a few extra bytes in the stream at the end that indicate a proper end of stream to the client. A client would read that only when the client tries to read more. There is a BufferedInputStream between the socket and DFSClient to confuse the things more. One could argue that DFSClient should try to read a few a bytes more that the user wants to read and close the socket... Thats a different issue. If we want to do this I think it should be done with a non-blocking read since we don't want user to wait more than what is required. Even if we did that I don't think it solves your problem, since this helps only if the random access is accessing at the end of a block. If you are accessing some other place, you would just be holding 64k (or 128k?) of precious kernel memory on both sides, only to throw away in next random access. I don't think streaming read is meant for random reads. I don't have much knowledge of access pattern or average size of reads in Hbase, but did you look into using pread? > These outstanding CLOSE_WAITs are an issue in hbase. How about idle connections? Is that ok? > Datanode active connections never returns to 0 > ---------------------------------------------- > > Key: HADOOP-2341 > URL: https://issues.apache.org/jira/browse/HADOOP-2341 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Paul Saab > Attachments: dfsclient.patch, hregionserver-stack.txt, > stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt > > > On trunk i continue to see the following in my data node logs: > 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 42 > 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 41 > 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 40 > 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 39 > 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 38 > 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 37 > 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 36 > 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 35 > 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 34 > 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of > active connections is: 33 > This number never returns to 0, even after many hours of no new data being > manipulated or added into the DFS. > Looking at netstat -tn i see significant amount of data in the send-q that > never goes away: > tcp 0 34240 ::ffff:XX.XX.XX.XXX:50010 ::ffff:YY.YY.YY.YY:55792 > ESTABLISHED > tcp 0 38968 ::ffff:XX.XX.XX.XXX:50010 ::ffff:YY.YY.YY.YY:38169 > ESTABLISHED > tcp 0 38456 ::ffff:XX.XX.XX.XXX:50010 ::ffff:YY.YY.YY.YY:35456 > ESTABLISHED > tcp 0 29640 ::ffff:XX.XX.XX.XXX:50010 ::ffff:YY.YY.YY.YY:59845 > ESTABLISHED > tcp 0 50168 ::ffff:XX.XX.XX.XXX:50010 ::ffff:YY.YY.YY.YY:44584 > ESTABLISHED > When sniffing the network I see that the remote side (YY.YY.YY.YY) is > returning a window size of 0 > 16:11:41.760474 IP XX.XX.XX.XXX.50010 > YY.YY.YY.YY.44584: . ack 3339984123 > win 46 <nop,nop,timestamp 1786247180 885681789> > 16:11:41.761597 IP YY.YY.YY.YY.44584 > XX.XX.XX.XXX.50010: . ack 1 win 0 > <nop,nop,timestamp 885801786 1775711351> > Then we look at the stack traces on each datanode, I will have tons of > threads that *never* go away in the following trace: > {code} > Thread 6516 ([EMAIL PROTECTED]): > State: RUNNABLE > Blocked count: 0 > Waited count: 0 > Stack: > java.net.SocketOutputStream.socketWrite0(Native Method) > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > java.net.SocketOutputStream.write(SocketOutputStream.java:136) > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > java.io.DataOutputStream.write(DataOutputStream.java:90) > org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) > org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) > org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) > org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) > java.lang.Thread.run(Thread.java:619) > {code} > Unfortunately there's very little in the logs with exceptions that could > point to this. I have some exceptions the following, but nothing that points > to problems between XX and YY: > {code} > 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to > delete block blk_4515246476002110310. Block not found in blockMap. > 2007-12-02 11:19:47,922 WARN dfs.DataNode - java.io.IOException: Error in > deleting blocks. > at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750) > at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675) > at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569) > at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720) > at java.lang.Thread.run(Thread.java:619) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.