Re: Datanode Xceivers

Jean-Adrien Tue, 20 Jan 2009 09:05:42 -0800

Hello all.

Thanks a lot for your tests and information. I understand better now the
reason of that.
I'll turn to HBase/Hadoop 0.19.0 as soon as possible. In the meantime it is
ok for me to keep the timeout *dfs.datanode.socket.write.timeout* disabled.


As soon I upgrade to 0.19.0, I'll enable to timeout, and see if it fixes the
problem of "Premeture EOF".

Have a nice day.

-- J.-A.


stack-3 wrote:
> 
> Jean-Adrien:
> 
> For kicks, I tried your test in hadoop 0.19.0 (and hbase trunk).  Using
> default for *dfs.datanode.socket.write.timeout*, 8 minutes, Scenario B in
> your message to hadoop list, I started up a loaded cluster.  After all had
> settled I counted DataXceiver threads in datanode.  Each of my datanodes
> had
> more than 100 instances.  I let it sit > 8 minutes.  Now the datanodes had
> but one or two DataXceiver threads (I could see all the ERROR timeouts
> tripping in the datanode log).  I started up a scan of all content in the
> table.  It ran without issue.  No exceptions in the regionserver logs.
> Number of DataXceiver threads came and went over life of the scan.
> 
> So, there is still the big datanode memory/thread pressure on startup and
> then there are the issues where there will be extra latency reestablishing
> timedout readers.
> 
> Scenario C where you shutdown hbase, for me, also shuts down all resources
> in datanode.
> 
> St.Ack
> 
> 
> 
> On Sun, Jan 11, 2009 at 11:34 PM, stack <[email protected]> wrote:
> 
>> Luo Ning, over the weekend, has made some comments you might be
>> interested
>> in over in HBASE-24 Jean-Adrien.
>> St.Ack
>>
>>
>>
>> Jean-Adrien wrote:
>>
>>> Hi everybody,
>>>
>>> I saw that you put some advises concerning the Hadoop settings when one
>>> has
>>> a problem of max xceivers reached, in the troubleshooting section of the
>>> wiki.
>>>
>>> About this topic, I recently post a question in hadoop-core user mailing
>>> list about their 'xcievers' thread behavior, since I still had to
>>> increase
>>> their amount as my HBase table grows, in order to avoid to reach the
>>> limit
>>> at startup time. And therefore my jvm use a lot of virtual memory
>>> (actually
>>> with 500MB for the heap, 1100 threads allocate 2GB virtual memory). This
>>> evenutally yields to swap and failure.
>>>
>>> Here is the link to my post. With a graph showing the number of thread
>>> the
>>> datanode creates when I start hbase.
>>> http://www.nabble.com/xceiverCount-limit-reason-td21349807.html#a21352818
>>>
>>> You can see that all threads are created at HBase startup time, and, if
>>> the
>>> timeout ( dfs.datanode.socket.write.timeout
>>> ) is set, they all ends with a timeout failure.
>>>
>>> The question for HBase is, why are the connection with hadoop kept open
>>> (and
>>> the thread as well) ? Does it happen only in my case ?
>>> I think that Slava has the same problem. But I don't think everybody
>>> does,
>>> since the cluster could not run without disabling the timeout parameter
>>> dfs.datanode.socket.write.timeout
>>>
>>> Anybody made those observations ?
>>> Thanks
>>>
>>> Jean-Adrien
>>>
>>>
>>>
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Datanode-Xceivers-tp21372227p21567289.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Datanode Xceivers

Reply via email to