Hello all. Thanks a lot for your tests and information. I understand better now the reason of that. I'll turn to HBase/Hadoop 0.19.0 as soon as possible. In the meantime it is ok for me to keep the timeout *dfs.datanode.socket.write.timeout* disabled.
As soon I upgrade to 0.19.0, I'll enable to timeout, and see if it fixes the problem of "Premeture EOF". Have a nice day. -- J.-A. stack-3 wrote: > > Jean-Adrien: > > For kicks, I tried your test in hadoop 0.19.0 (and hbase trunk). Using > default for *dfs.datanode.socket.write.timeout*, 8 minutes, Scenario B in > your message to hadoop list, I started up a loaded cluster. After all had > settled I counted DataXceiver threads in datanode. Each of my datanodes > had > more than 100 instances. I let it sit > 8 minutes. Now the datanodes had > but one or two DataXceiver threads (I could see all the ERROR timeouts > tripping in the datanode log). I started up a scan of all content in the > table. It ran without issue. No exceptions in the regionserver logs. > Number of DataXceiver threads came and went over life of the scan. > > So, there is still the big datanode memory/thread pressure on startup and > then there are the issues where there will be extra latency reestablishing > timedout readers. > > Scenario C where you shutdown hbase, for me, also shuts down all resources > in datanode. > > St.Ack > > > > On Sun, Jan 11, 2009 at 11:34 PM, stack <[email protected]> wrote: > >> Luo Ning, over the weekend, has made some comments you might be >> interested >> in over in HBASE-24 Jean-Adrien. >> St.Ack >> >> >> >> Jean-Adrien wrote: >> >>> Hi everybody, >>> >>> I saw that you put some advises concerning the Hadoop settings when one >>> has >>> a problem of max xceivers reached, in the troubleshooting section of the >>> wiki. >>> >>> About this topic, I recently post a question in hadoop-core user mailing >>> list about their 'xcievers' thread behavior, since I still had to >>> increase >>> their amount as my HBase table grows, in order to avoid to reach the >>> limit >>> at startup time. And therefore my jvm use a lot of virtual memory >>> (actually >>> with 500MB for the heap, 1100 threads allocate 2GB virtual memory). This >>> evenutally yields to swap and failure. >>> >>> Here is the link to my post. With a graph showing the number of thread >>> the >>> datanode creates when I start hbase. >>> http://www.nabble.com/xceiverCount-limit-reason-td21349807.html#a21352818 >>> >>> You can see that all threads are created at HBase startup time, and, if >>> the >>> timeout ( dfs.datanode.socket.write.timeout >>> ) is set, they all ends with a timeout failure. >>> >>> The question for HBase is, why are the connection with hadoop kept open >>> (and >>> the thread as well) ? Does it happen only in my case ? >>> I think that Slava has the same problem. But I don't think everybody >>> does, >>> since the cluster could not run without disabling the timeout parameter >>> dfs.datanode.socket.write.timeout >>> >>> Anybody made those observations ? >>> Thanks >>> >>> Jean-Adrien >>> >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/Datanode-Xceivers-tp21372227p21567289.html Sent from the HBase User mailing list archive at Nabble.com.
