[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

Raghu Angadi (JIRA) Wed, 12 Dec 2007 10:37:03 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551059
 ]


Raghu Angadi commented on HADOOP-2341:
--------------------------------------

>Studying these CLOSE_WAITs over last few days, the client buffer shows 1 or 0 
>bytes in the queue. At a minimum, I would expect that when client has read all 
>of a block - netstat shows queues of size 0 - then the client should close its 
>socket and free up datanode-side resources.

This is ok. There are a few extra bytes in the stream at the end that indicate 
a proper end of stream to the client. A client would read that only when the 
client tries to read more. There is a BufferedInputStream between the socket 
and DFSClient to confuse the things more. 

One could argue that DFSClient should try to read a few a bytes more that the 
user wants to read and close the socket... Thats a different issue. If we want 
to do this I think it should be done with a non-blocking read since we don't 
want user to wait more than what is required. Even if we did that I don't think 
it solves your problem, since this helps only if the random access is accessing 
at the end of a block. If you are accessing some other place, you would just be 
holding 64k (or 128k?) of precious kernel memory on both sides, only to throw 
away in next random access.

I don't think streaming read is meant for random reads. I don't have much 
knowledge of access pattern or average size of reads in Hbase,  but did you 
look into using pread?

> These outstanding CLOSE_WAITs are an issue in hbase. 

How about idle connections? Is that ok?



> Datanode active connections never returns to 0
> ----------------------------------------------
>
>                 Key: HADOOP-2341
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2341
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Paul Saab
>         Attachments: dfsclient.patch, hregionserver-stack.txt, 
> stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt
>
>
> On trunk i continue to see the following in my data node logs:
> 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 42
> 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 41
> 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 40
> 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 39
> 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 38
> 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 37
> 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 36
> 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 35
> 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 34
> 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 33
> This number never returns to 0, even after many hours of no new data being 
> manipulated or added into the DFS.
> Looking at netstat -tn i see significant amount of data in the send-q that 
> never goes away:
> tcp        0  34240 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:55792   
> ESTABLISHED 
> tcp        0  38968 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:38169   
> ESTABLISHED 
> tcp        0  38456 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:35456   
> ESTABLISHED 
> tcp        0  29640 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:59845   
> ESTABLISHED 
> tcp        0  50168 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:44584   
> ESTABLISHED 
> When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
> returning a window size of 0
> 16:11:41.760474 IP XX.XX.XX.XXX.50010 > YY.YY.YY.YY.44584: . ack 3339984123 
> win 46 <nop,nop,timestamp 1786247180 885681789>
> 16:11:41.761597 IP YY.YY.YY.YY.44584 > XX.XX.XX.XXX.50010: . ack 1 win 0 
> <nop,nop,timestamp 885801786 1775711351>
> Then we look at the stack traces on each datanode, I will have tons of 
> threads that *never* go away in the following trace:
> {code}
> Thread 6516 ([EMAIL PROTECTED]):
>   State: RUNNABLE
>   Blocked count: 0
>   Waited count: 0
>   Stack:
>     java.net.SocketOutputStream.socketWrite0(Native Method)
>     java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>     java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>     java.io.DataOutputStream.write(DataOutputStream.java:90)
>     org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
>     org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
>     org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
>     org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
>     java.lang.Thread.run(Thread.java:619)
> {code}
> Unfortunately there's very little in the logs with exceptions that could 
> point to this.  I have some exceptions the following, but nothing that points 
> to problems between XX and YY:
> {code}
> 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
> delete block blk_4515246476002110310. Block not found in blockMap. 
> 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
> deleting blocks.
>         at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
>         at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
>         at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
>         at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
>         at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

Reply via email to