I will add this to the troubleshooting page of the wiki, good job Andrew. As a next step, maybe HBase should handle this in a better way?
J-D On Mon, Jul 21, 2008 at 8:09 AM, Renaud Delbru <[EMAIL PROTECTED]> wrote: > Hi Andrew, > > Thanks for your explanations, it gives me more insight on how HBase works > internally. > > The cause of the problem was in fact the excessive load on the servers. > Processes was stuck due to excessive swapping. > I have reduced the number of Hadoop tasks in parallel from 24 to 12. Then, > I have run a hadoop job for testing. There is no more rpc timeout errors in > the logs, and HBase stays stable during the whole process (no regionserver > went down). > > Thanks Andrew and Jean-Daniel for the help and advices. > -- > Renaud Delbru > > > excessive load, with > >> processes stuck in disk wait, and/or threads starved for >> CPU time, etc. Processes can be stuck in disk wait while >> being swapped in or out also, so excessive RAM/heap is a >> third possibility. >> > > I have two questions, to try to understand better what is >>> happening: >>> >>> 1. Why HBase forces the regionserver to restart ? In which >>> particular case this is triggered ? >>> >>> >> >> A regionserver restarts itself when there is a serious internal error. >> There are many potential causes of a restart. >> In your case the channel to DFS fails. >> >> >> >>> 2. Why the region server is not correctly restarted ? >>> HBase forces the region server to restart, but in fact the >>> region server is just shutdown, the HRegionServer process >>> is still alive. As a consequence, the master cannot connect >>> to the region server, and any insert is refused. >>> >>> >> >> It is the client that is attempting to talk to the region server e.g. to >> insert. Clients get region location assignments from the master, cache them, >> and then interact with the regionservers directly. Only after a certain >> number of retries after a regionserver has failed will the client go back to >> the master and ask whether a region has received a new assignment. If the >> system is operating properly, the regionserver would have stopped reporting >> to the master, the master would have declared it dead, and the master would >> have reassigned the affected regions to other regionservers. The client then >> discovers the reassignment, contacts the newly assigned regionserver(s), and >> progress continues (eventually). >> >> >> >>> If this is related to timeout answer from one region server >>> or from the master, is it possible to modify some options to >>> tell HBase to wait until he gets an answer, and to not force >>> the region server to restart. >>> >>> >> >> If a regionserver has encountered a fatal error, there's no other >> meaningful option but to restart. >> >>> Thanks for you help. >>> >>> -- >>> Renaud Delbru >>> >>> Andrew Purtell wrote: >>> >>> >>>> Actually I missed this: >>>> >>>> >>>> >>>>> 2008-07-19 07:31:45,505 WARN >>>>> >>>>> >>>> org.apache.hadoop.dfs.DFSClient: >>> >>> >>>> DataStreamer Exception: >>>>> >>>>> >>>> java.net.SocketTimeoutException: timed out >>> >>> >>>> waiting for rpc response >>>>> >>>>> >>>> So it was a socket timeout. >>>> I think you need to spread the load by adding more DFS >>>> >>>> >>> datanodes at a minimum and probably also HBase >>> regionservers, or reduce concurrency. >>> >>> >>>> - Andy >>>> >>>> --- On Sat, 7/19/08, Andrew Purtell >>>> >>>> >>> <[EMAIL PROTECTED]> wrote: >>> >>> >>>> >>>> >>>>> From: Andrew Purtell <[EMAIL PROTECTED]> >>>>> Subject: Re: Replay of hlog required, forcing >>>>> >>>>> >>>> regionserver restart >>> >>> >>>> To: [email protected] >>>>> Date: Saturday, July 19, 2008, 9:54 AM >>>>> Hello Renaud, >>>>> >>>>> I think your cluster is overloaded. >>>>> >>>>> >>>>> >>>>>> WARN org.apache.hadoop.hbase.util.Sleeper: We >>>>>> >>>>>> >>>>> slept >>> >>> >>>> >>>>>> >>>>> 30837ms, >>>>> >>>>> >>>>>> ten times longer than scheduled: 3000 >>>>>> 2008-07-18 03:55:57,783 WARN >>>>>> >>>>>> >>>>> org.apache.hadoop.hbase.util.Sleeper: We >>>>> >>>>> >>>>>> slept 583701ms, ten times longer than >>>>>> >>>>>> >>>>> scheduled: 3000 >>> >>> >>>> >>>>>> >>>>> See >>>>> >>>>> >>>> https://issues.apache.org/jira/browse/HBASE-616. >>> >>>> I think these types of messages imply excessive >>>>> >>>>> >>>> load, with >>> >>> >>>> processes stuck in disk wait, and/or threads >>>>> >>>>> >>>> starved for >>> >>> >>>> CPU time, etc. Processes can be stuck in disk wait >>>>> >>>>> >>>> while >>> >>> >>>> being swapped in or out also, so excessive >>>>> >>>>> >>>> RAM/heap is a >>> >>> >>>> third possibility. Given a 3 node cluster with a >>>>> >>>>> >>>> full >>> >>> >>>> complement of Hadoop and HBase daemons and 24 >>>>> >>>>> >>>> tasks, >>> >>> >>>> perhaps all three load-dimensions are redlined. >>>>> >>>>> >>>> :-) >>> >>> >>>> It's hard to say what exactly was the root >>>>> >>>>> >>>> cause of the >>> >>> >>>> exception on the DFS client channel which brought >>>>> >>>>> >>>> down the >>> >>> >>>> regionserver: >>>>> >>>>> > java.io.IOException: Premeture EOF from >>>>> >>>>> >>>> inputStream >>> >>> >>>> at >>>>> > >>>>> >>>>> >>>>> >>>> org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) >>> >>> >>>> This could have been a socket timeout, or a fault >>>>> >>>>> >>>> on the >>> >>> >>>> data node. Does anything useful/meaningful appear >>>>> >>>>> >>>> in the >>> >>> >>>> datanode log? >>>>> - Andy >>>>> >>>>> --- On Sat, 7/19/08, Renaud Delbru >>>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> From: Renaud Delbru >>>>>> >>>>>> >>>>> <[EMAIL PROTECTED]> >>> >>> >>>> Subject: Re: Replay of hlog required, forcing >>>>>> >>>>>> >>>>> regionserver restart >>>>> >>>>> >>>>>> To: [email protected] >>>>>> Date: Saturday, July 19, 2008, 7:46 AM >>>>>> The cluster is currently composed of 3 nodes. >>>>>> >>>>>> >>>>> The >>> >>> >>>> >>>>>> >>>>> cluster is >>>>> >>>>> >>>>>> also used for executing hadoop jobs. One of >>>>>> >>>>>> >>>>> the node >>> >>> >>>> >>>>>> >>>>> acts as an >>>>> >>>>> >>>>>> hadoop master, an HMaster and an >>>>>> >>>>>> >>>>> HRegionServer. >>> >>> >>>> The current settings are: Xmx=1024M for the >>>>>> >>>>>> >>>>> Hadoop >>> >>> >>>> >>>>>> >>>>> Master, >>>>> >>>>> >>>>>> Xmx=2048M for the HMaster, Xmx=2048M for each >>>>>> >>>>>> >>>>> HRegionServer. >>>>> >>>>> >>>>>> The inserts are somehow randomised. We have 24 >>>>>> >>>>>> >>>>> hadoop >>> >>> >>>> >>>>>> >>>>> jobs >>>>> >>>>> >>>>>> uploading various documents. The row key is >>>>>> >>>>>> >>>>> the URL of >>> >>> >>>> >>>>>> >>>>> the document, >>>>> >>>>> >>>>>> and each hadoop task has to upload a range of >>>>>> >>>>>> >>>>> URLs >>> >>> >>>> >>>>>> >>>>> (for example, >>>>> >>>>>> http://my-url/AAA - http://my-url/BBB). >>>>>> Thus, I assume that each hadoop tasks will hit >>>>>> >>>>>> >>>>> a >>> >>> >>>> >>>>>> >>>>> different >>>>> >>>>> >>>>>> region and that the insert load is distributed >>>>>> >>>>>> >>>>> among >>> >>> >>>> >>>>>> >>>>> the nodes. >>>>> >>>>> >>>>>> Thanks. >>>>>> -- >>>>>> Renaud Delbru >>>>>> >>>>>> Jean-Daniel Cryans wrote: >>>>>> >>>>>> >>>>>>> Ok then, at least having the nodes >>>>>>> >>>>>>> >>>>>> synchronized >>> >>> >>>> >>>>>>> >>>>>> keeps >>>>> >>>>> >>>>>> you safe from other >>>>>> >>>>>> >>>>>>> problems. By the way, how many nodes do >>>>>>> >>>>>>> >>>>>> you have? >>> >>> >>>> One other thing to check is your writing >>>>>>> >>>>>>> >>>>>> pattern. >>> >>> >>>> >>>>>>> >>>>>> If >>>>> >>>>> >>>>>> you insert data with >>>>>> >>>>>> >>>>>>> sequential row keys, an example would be >>>>>>> >>>>>>> >>>>>> designing >>>>> >>>>> >>>>>> your row keys as an >>>>>> >>>>>> >>>>>>> incremented primary key equivalent but >>>>>>> >>>>>>> >>>>>> left >>> >>> >>>> >>>>>>> >>>>>> padded >>>>> >>>>> >>>>>> with zeros, these inserts >>>>>> >>>>>> >>>>>>> will always hit on a single region thus >>>>>>> >>>>>>> >>>>>> the load >>> >>> >>>> >>>>>>> >>>>>> will >>>>> >>>>> >>>>>> not be distributed. If >>>>>> >>>>>> >>>>>>> it's your case, try randomizing the >>>>>>> >>>>>>> >>>>>> inserts. >>> >>> >>>> I will also look at your logs monday >>>>>>> >>>>>>> >>>>>> morning. >>> >>> >>>> Thx, >>>>>>> >>>>>>> J-D >>>>>>> >>>>>>> On Sat, Jul 19, 2008 at 5:47 AM, Renaud >>>>>>> >>>>>>> >>>>>> Delbru >>> >>> >>>> >>>>>>> >>>>>> <[EMAIL PROTECTED]> >>>>>> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi Jean-Daniel, >>>>>>>> >>>>>>>> I have synchronised all the nodes to >>>>>>>> >>>>>>>> >>>>>>> the same >>> >>> >>>> >>>>>>>> >>>>>>> ntp >>>>> >>>>> >>>>>> server yesterday. >>>>>> >>>>>> >>>>>>> This morning, another regionserver >>>>>>>> >>>>>>>> >>>>>>> went down >>> >>> >>>> >>>>>>>> >>>>>>> for >>>>> >>>>> >>>>>> the same reason (log >>>>>> >>>>>> >>>>>>> attached). >>>>>>>> >>>>>>>> Thanks. >>>>>>>> -- >>>>>>>> Renaud Delbru >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Jean-Daniel Cryans wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Renaud, >>>>>>>>> >>>>>>>>> Yes, the impact is important. Try >>>>>>>>> >>>>>>>>> >>>>>>>> fixing >>> >>> >>>> >>>>>>>>> >>>>>>>> it >>>>> >>>>> >>>>>> first because typically heavy >>>>>> >>>>>> >>>>>>> write load problems only occur >>>>>>>>> >>>>>>>>> >>>>>>>> when there >>> >>> >>>> >>>>>>>>> >>>>>>>> is a >>>>> >>>>> >>>>>> very few number of nodes >>>>>> >>>>>> >>>>>>> and >>>>>>>>> regions. We have seen this many >>>>>>>>> >>>>>>>>> >>>>>>>> times. >>> >>> >>>> Happy fixing! >>>>>>>>> >>>>>>>>> J-D >>>>>>>>> >>>>>>>>> On Fri, Jul 18, 2008 at 5:20 PM, >>>>>>>>> >>>>>>>>> >>>>>>>> Renaud >>> >>> >>>> >>>>>>>>> >>>>>>>> Delbru >>>>> >>>>> >>>>>> <[EMAIL PROTECTED]> >>>>>> >>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Jean-Daniel, >>>>>>>>>> >>>>>>>>>> my nodes clocks are >>>>>>>>>> >>>>>>>>>> >>>>>>>>> desynchronised >>> >>> >>>> >>>>>>>>>> >>>>>>>>> (each >>>>> >>>>> >>>>>> node has a different time). >>>>>> >>>>>> >>>>>>> Could >>>>>>>>>> it have an impact on HBase ? >>>>>>>>>> >>>>>>>>>> At the moment of the failure, >>>>>>>>>> >>>>>>>>>> >>>>>>>>> the >>> >>> >>>> >>>>>>>>>> >>>>>>>>> node had >>>>> >>>>> >>>>>> about 55 regions (with >>>>>> >>>>>> >>>>>>> hbase.hregion.max.filesize = >>>>>>>>>> >>>>>>>>>> >>>>>>>>> 128M). >>> >>> >>>> Thanks. >>>>>>>>>> -- >>>>>>>>>> Renaud Delbru >>>>>>>>>> >>>>>>>>>> As One node is at >>>>>>>>>> >>>>>>>>>> Jean-Daniel Cryans wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Renaud, >>>>>>>>>>> >>>>>>>>>>> Just to make sure, are all >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> your >>> >>> >>>> >>>>>>>>>>> >>>>>>>>>> nodes >>>>> >>>>> >>>>>> clocks synchronized? Also, at the >>>>>> >>>>>> >>>>>>> moment of that failure, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> how many >>> >>> >>>> >>>>>>>>>>> >>>>>>>>>> regions did the table have? >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> J-D >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 18, 2008 at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> 5:11 AM, >>> >>> >>>> >>>>>>>>>>> >>>>>>>>>> Renaud Delbru <[EMAIL PROTECTED]> >>>>>> >>>>>> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I am using HBase >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 0.2.0-dev, >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> Hudson >>>>> >>>>> >>>>>> Build #208. >>>>>> >>>>>> >>>>>>> I am experiencing a >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> problem >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> with >>>>> >>>>> >>>>>> HBase that looks like the issue >>>>>> >>>>>> >>>>>> >>>>> https://issues.apache.org/jira/browse/HBASE-15. >>> >>> >>>> >>>>>> >>>>>>> During an upload >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> intensive >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> task, >>>>> >>>>> >>>>>> one of the regionserver went down to >>>>>> >>>>>> >>>>>>> force rerun of hlog. >>>>>>>>>>>> >>>>>>>>>>>> This is not the first >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> time we >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> have >>>>> >>>>> >>>>>> experienced this problem, it >>>>>> >>>>>> >>>>>>> happened >>>>>>>>>>>> already on another >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> cluster >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> node a >>>>> >>>>> >>>>>> few days ago, in the same settings. >>>>>> >>>>>> >>>>>>> I have attached the >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> regionserver >>>>> >>>>> >>>>>> log. I don't notice something >>>>>> >>>>>> >>>>>>> particular in the >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> hadoop >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> datanode >>>>> >>>>> >>>>>> log. >>>>>> >>>>>> >>>>>>> Thanks in advance for >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> any >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> advices. >>>>> >>>>> >>>>>> -- >>>>>>>>>>>> Renaud Delbru >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:40:51,861 INFO >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.LogRoller: >>> >>> >>>> >>>>>> >>>>> Rolling >>>>> >>>>> >>>>>> hlog. Number of >>>>>> >>>>>> >>>>>>> entries: 30008 >>>>>>>>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:40:52,073 INFO >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HLog: >>>>>> >>>>>> >>>>>>> New >>>>>>>>>>>> log writer created at >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>> /hbase/log_140.203.154.176_1216023782181_60020/hlog.dat.1216348851959 >>> >>> >>>> >>>>> >>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:43:36,257 INFO >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.LogRoller: >>> >>> >>>> >>>>>> >>>>> Rolling >>>>> >>>>> >>>>>> hlog. Number of >>>>>> >>>>>> >>>>>>> entries: 30008 >>>>>>>>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:43:37,020 INFO >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HLog: >>>>>> >>>>>> >>>>>>> New >>>>>>>>>>>> log writer created at >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>> /hbase/log_140.203.154.176_1216023782181_60020/hlog.dat.1216349016755 >>> >>> >>>> >>>>> >>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:45:12,963 INFO >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>>>> >>>>>> >>>>>>> starting compaction on >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> region >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> page-repository, >>>>>> >>>>>> >>>>> http://dbpedia.org/resource/Judah_ibn_Ghayyat,1215919264401 >>> >>> >>>> >>>>> >>>>>> 2008-07-18< >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>> http://dbpedia.org/resource/Judah_ibn_Ghayyat,12159192644012008-07-18 >>> >>> >>>> >>>>> >>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 03:45:58,868 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>> >>>> WARN >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.util.Sleeper: We slept >>>>>> >>>>>> >>>>> 30837ms, >>>>> >>>>> >>>>>> ten times longer than >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> scheduled: >>>>> >>>>> >>>>>> 3000 >>>>>> >>>>>> >>>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:55:57,783 WARN >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.util.Sleeper: We >>>>>> >>>>>> >>>>>>> slept >>>>>>>>>>>> 583701ms, ten times >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> longer >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> than >>>>> >>>>> >>>>>> scheduled: 3000 >>>>>> >>>>>> >>>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:55:57,787 INFO >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.LogRoller: >>> >>> >>>> >>>>>> >>>>> Rolling >>>>> >>>>> >>>>>> hlog. Number of >>>>>> >>>>>> >>>>>>> entries: 21011 >>>>>>>>>>>> 2008-07-18 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 03:55:58,410 WARN >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.dfs.DFSClient: Exception >>>>>> >>>>>> >>>>>>> while reading from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> blk_-6424749704096696952 of >>>>>> >>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>> /hbase/page-repository/942413588/field/mapfiles/7570604533190594376/data >>> >>> >>>> >>>>> >>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 140.203.154.176:50010: >>> >>> >>>> >>>>>>>>>>>> >>>>>>>>>>> java.io.IOException: Premeture EOF from >>>>>> >>>>>> >>>>>>> ... > > [Message clipped]
