Jean-Adrien wrote:
Hi everybody,
I saw that you put some advises concerning the Hadoop settings when one has
a problem of max xceivers reached, in the troubleshooting section of the
wiki.
Yes. Thanks to your research Jean-Adrien. And have you seen the
addition made by Andrew Purtell suggesting upping the datanode listeners?
About this topic, I recently post a question in hadoop-core user mailing
list about their 'xcievers' thread behavior, since I still had to increase
their amount as my HBase table grows, in order to avoid to reach the limit
at startup time. And therefore my jvm use a lot of virtual memory (actually
with 500MB for the heap, 1100 threads allocate 2GB virtual memory). This
evenutally yields to swap and failure.
Yeah. That makes sense (Have you tried setting thread stack size down
-- -Xss -- so less outside-of-the-heap memory is used?)
Here is the link to my post. With a graph showing the number of thread the
datanode creates when I start hbase.
http://www.nabble.com/xceiverCount-limit-reason-td21349807.html#a21352818
You can see that all threads are created at HBase startup time, and, if the
timeout ( dfs.datanode.socket.write.timeout
) is set, they all ends with a timeout failure.
The question for HBase is, why are the connection with hadoop kept open (and
the thread as well) ? Does it happen only in my case ?
No. Happens for everyone. HBase keeps open its connection to every
StoreFile. We do this to avoid paying the open cost every time a file
is accessed primarily to improve random-access performance. StoreFiles
in hbase are based on Hadoop MapFile. MapFile is two SequenceFiles --
data and index. An open would require at least a trip to namenode per
SequenceFile to learn blocks that make up a file, then trip to the
holding datanodes first to read in index if a random-access and then to
the target block once its location was found. Instead, per StoreFile,
on open, we read in the index (and then close the index file) and then
keep up the DFSClient connection to the datafile so block locations are
kept over in the hbase regionserver.
Keeping open a permanent connection to the store file costs us. Users
will trip over the 'too many open files...' pretty early on unless they
up their ulimit for file descriptors. Also, keeping the index in memory
as we currently do is the main cause of heap usage -- particularly if
cells are small. Then there is the cost over in HDFS which is what you
are bringing up here.
lava has the same problem. But I don't think everybody does,
since the cluster could not run without disabling the timeout parameter
dfs.datanode.socket.write.timeout
Anybody made those observations ?
I haven't been paying attention of late. Thanks for bringing it up
Jean-Adrien. Lets try and figure it (I 'thought' that the timer over on
the datanode would close idle sockets but that subsequent accesses would
revive the connection but that doesn't seem to be the case going by your
hadoop posting).
St.Ack