[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549608
 ] 

stack commented on HADOOP-2341:
-------------------------------

Raghu:  Thanks for responding.

Now we've done the tracing, I suppose it makes sense that an open hdfs file 
descriptor consumes a socket, thread, buffers, etc. in the data node.  That the 
cost is so high was a bit of a surprise going by the client side with its 
having only one connection to the datanode (Are there plans to do as Michael B 
asks above making server side do selects instead of a socket-per file?).

I found some leaks after instrumenting SequenceFile and I upped the size of the 
files we keep over hdfs so we have less open files at any one time.   This 
helps some.  There still seems to be an issue w/ our not closing down 
everything as I see outstanding CLOSE_WAITS after I'm done with an upload.  I'm 
now trying to instrument DFSClient to see if I can figure whats going on.

In hbase, a table is made of regions; regions are made of stores and stores are 
made of mapfiles where the key is row/column/timestamp and the value is the 
cell data.  hbase keeps open all mapfiles so access to a 'row' is 'fast'.  Open 
mapfiles will have their index in memory, the socket connection up, etc.  The 
alternative of opening a file on every row access, I'd imagine, would be slower.

Desire is consuming as few resources as possible in hdfs while keeping access 
as fast as possible.  Any suggestions on how we might improve our usage would 
be appreciated.

> Datanode active connections never returns to 0
> ----------------------------------------------
>
>                 Key: HADOOP-2341
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2341
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Paul Saab
>         Attachments: hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, 
> stacks-YY.YY.YY.YY.txt
>
>
> On trunk i continue to see the following in my data node logs:
> 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 42
> 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 41
> 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 40
> 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 39
> 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 38
> 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 37
> 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 36
> 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 35
> 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 34
> 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
> active connections is: 33
> This number never returns to 0, even after many hours of no new data being 
> manipulated or added into the DFS.
> Looking at netstat -tn i see significant amount of data in the send-q that 
> never goes away:
> tcp        0  34240 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:55792   
> ESTABLISHED 
> tcp        0  38968 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:38169   
> ESTABLISHED 
> tcp        0  38456 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:35456   
> ESTABLISHED 
> tcp        0  29640 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:59845   
> ESTABLISHED 
> tcp        0  50168 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:44584   
> ESTABLISHED 
> When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
> returning a window size of 0
> 16:11:41.760474 IP XX.XX.XX.XXX.50010 > YY.YY.YY.YY.44584: . ack 3339984123 
> win 46 <nop,nop,timestamp 1786247180 885681789>
> 16:11:41.761597 IP YY.YY.YY.YY.44584 > XX.XX.XX.XXX.50010: . ack 1 win 0 
> <nop,nop,timestamp 885801786 1775711351>
> Then we look at the stack traces on each datanode, I will have tons of 
> threads that *never* go away in the following trace:
> {code}
> Thread 6516 ([EMAIL PROTECTED]):
>   State: RUNNABLE
>   Blocked count: 0
>   Waited count: 0
>   Stack:
>     java.net.SocketOutputStream.socketWrite0(Native Method)
>     java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>     java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>     java.io.DataOutputStream.write(DataOutputStream.java:90)
>     org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
>     org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
>     org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
>     org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
>     java.lang.Thread.run(Thread.java:619)
> {code}
> Unfortunately there's very little in the logs with exceptions that could 
> point to this.  I have some exceptions the following, but nothing that points 
> to problems between XX and YY:
> {code}
> 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
> delete block blk_4515246476002110310. Block not found in blockMap. 
> 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
> deleting blocks.
>         at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
>         at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
>         at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
>         at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
>         at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to