Re: Hbase pausing problems

Seraph Imalia Wed, 20 Jan 2010 11:23:15 -0800

> From: Jean-Daniel Cryans <[email protected]>
> Reply-To: <[email protected]>
> Date: Wed, 20 Jan 2010 10:55:26 -0800
> To: <[email protected]>
> Subject: Re: Hbase pausing problems
> 
> A table is sorted by row key and all the regions are sequentially
> split so that a row will always go to a single region and if that
> region is unavailable for some reason then you can't write
> immediately. If your write pattern is distributed among the regions,
> they will all slowly synchronize on the hung region. This is probably
> why the writes stops.
> 

What do you mean by "if your write pattern is distributed among the
regions"?  How would I know if it is?

> Or, if you are always writing row keys sequentially then it's even
> _worse_ because all writes will always go to the same region so
> there's no load distribution at all. Example: incrementing row key.
> 

We are using UUID.randomUUID() as the row key.  It is quite random.

> You also seem to have a lot of regions per region server which plays
> in the global memstore size.
> 

Region Servers (so everyone can see without opening the logs)

dynobuntu6:60030 regions=780
dynobuntu7:60030 regions=780
dynobuntu8:60030 regions=779
Total: servers: 3 regions=2339

> Finally, I recommend setting:
> 
> heap to 3G
> hbase.regionserver.global.memstore.upperLimit to 0.5
> hbase.regionserver.global.memstore.lowerLimit to 0.48
> 

I will try this tonight.

> This should help a lot.
> 
> Thx,
> 
> J-D

Regards,
Seraph

> 
> On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <[email protected]> wrote:
>> 
>> 
>> 
>>> From: stack <[email protected]>
>>> Reply-To: <[email protected]>
>>> Date: Wed, 20 Jan 2010 07:26:58 -0800
>>> To: <[email protected]>
>>> Subject: Re: Hbase pausing problems
>>> 
>>> On Wed, Jan 20, 2010 at 1:06 AM, Seraph Imalia <[email protected]> wrote:
>>> 
>>>> 
>>>> The client stops being able to write to hBase as soon as 1 of the
>>>> regionservers starts doing this...
>>>> 
>>>> 2010-01-17 01:16:25,729 INFO
>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
>>>> ChannelDelivery,5352f559-d68e-42e9-be92-8bae82185ed1,1262544772804 because
>>>> global memstore limit of 396.7m exceeded; currently 396.7m and flushing
>>>> till
>>>> 247.9m
>>>> 
>>>> See hbase.regionserver.global.memstore.upperLimit and
>>> hbase.regionserver.global.memstore.lowerLimit.  The former is a prophylactic
>>> against OOME'ing.  The sum of all memory used by MemStores is not allowed to
>>> grow beyond 0.4 of total heap size (0.4 is default).  The 247.9M figure in
>>> the above is 0.25 of the heap by default.  Writes are held up until
>>> sufficient MemStore space has been dumped by flushing.  You seem to be
>>> taking on writes at a rate that is in excess of the rate at which you can
>>> flush.  We'll take a lok at your logs..... You might up the 0.25 to 0.3 or
>>> 0.32.  This will shorten the times we stop taking on writes but at the cost
>>> of increasing the number of times we disallow writes.
>> 
>> Does this mean that when 1 regionserver does a memstore flush, the other two
>> regionservers are also unavailable for writes?  I have watched the logs
>> carefully to make sure that not all the regionservers are flushing at the
>> same time.  Most of the time, only 1 server flushes at a time and in rare
>> cases, I have seen two at a time.
>> 
>>> 
>>> It also looks like you have little RAM space given over to hbase, just 1G?
>>> If your traffic is bursty, giving hbase more RAM might help it get over
>>> these write humps.
>> 
>> I have it at 1G on purpose.  When we first had the problem, I immediately
>> thought the problem was resource related, so I increased the hBase RAM to 3G
>> (each server has 8G - I was carefull to watch for swapping).  This made the
>> problem worse because each memstore flush took longer which stopped writing
>> for longer and people started noticing that our system was down during those
>> periods.  Granted, the period between flushes was longer, but the effect was
>> that people started to notice our downtime.  So I have put the RAM back down
>> to 1G to minimize the negative effects on the live system and less people
>> notice it.
>> 
>> 
>>> 
>>> 
>>> 
>>>> Or this...
>>>> 
>>>> 2010-01-17 01:16:26,159 INFO
>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
>>>> AdDelivery,613a401d-fb8a-42a9-aac6-d957f6281035,1261867806692 because
>>>> global
>>>> memstore limit of 396.7m exceeded; currently 390.4m and flushing till
>>>> 247.9m
>>>> 
>>>> This is a by-product of the above hitting 'global limit'.
>>> 
>>> 
>>> 
>>>> And then as soon as it finishes that, it starts doing this...
>>>> 
>>>> 2010-01-17 01:16:36,709 DEBUG
>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
>>>> requested for region
>>>> AdDelivery,fb98f6c9-db13-4853-92ee-ffe1182fffd0,1263544763046/350999600
>>>> because: regionserver/192.168.2.88:60020.cacheFlusher
>>>> 
>>>> These are 'normal'  We are logging fact that a compaction has been
>>> requested on a region.  This does not get in the way of our taking on writes
>>> (not directly).
>>> 
>>> 
>>> 
>>>> And as soon as it has finished the last of the Compaction Requests, the
>>>> client recovers and the regionserver starts doing this...
>>>> 
>>>> 2010-01-17 01:16:36,713 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>>>> Compaction size of ChannelDelivery_Family: 209.5m; Skipped 1 file(s), size:
>>>> 216906650
>>>> 2010-01-17 01:16:36,713 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>>>> Started compaction of 3 file(s)  into
>>>> /hbase/ChannelDelivery/compaction.dir/165262792, seqid=1241653592
>>>> 2010-01-17 01:16:37,143 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>>>> Completed compaction of ChannelDelivery_Family; new storefile is
>>>> 
>>>> 
hdfs://dynobuntu6:8020/hbase/ChannelDelivery/165262792/ChannelDelivery_Fami>>>>
l
>>>> y/1673693545539520912; store size is 209.5m
>>>> 
>>> 
>>> Above is 'normal'.  At DEBUG you see detail on hbase going about its
>>> business.
>>> 
>>> 
>>>> 
>>>> All of these logs seem perfectly acceptable to me - the problem is that it
>>>> just requires one of the regionservers to start doing this for the client
>>>> to
>>>> be prevented from inserting new rows into hBase.  The logs don't seem to
>>>> explain why this is happening.
>>>> 
>>>> 
>>> Clients will be blocked writing regions carried by the effected regionserver
>>> only.  Your HW is not appropriate to the load as currently setup.  You might
>>> also consider adding more machines to your cluster.
>>> 
>> 
>> Hmm... How does hBase decide which region to write to?  Is it possible that
>> hBase is deciding to write all our current records to one specific region
>> that happens to be on the server that is busy doing a memstore flush?
>> 
>> We are currently inserting about 6 million rows per day.  SQL Server (which
>> I am so happy to no longer be using for this) was able to write (and
>> replicate to a slave) 9 million records (using the same spec'ed server).  I
>> would like to see hBase cope with the 3 we have given it at least when
>> inserting 6 million.  Do you think this is possible or is our only answer to
>> throw on more servers?
>> 
>> Seraph
>> 
>>> St.Ack
>>> 
>>> 
>>> 
>>>> Thank you for your assistance thus far; please let me know if you need or
>>>> discover anything else?
>>>> 
>>>> Regards,
>>>> Seraph
>>>> 
>>>> 
>>>> 
>>>>> From: Jean-Daniel Cryans <[email protected]>
>>>>> Reply-To: <[email protected]>
>>>>> Date: Mon, 18 Jan 2010 09:49:16 -0800
>>>>> To: <[email protected]>
>>>>> Subject: Re: Hbase pausing problems
>>>>> 
>>>>> The next step would be to take a look at your region server's log
>>>>> around the time of the insert and clients who don't resume after the
>>>>> loss of a region server. If you are able to gzip them and put them on
>>>>> a public server, it would be awesome.
>>>>> 
>>>>> Thx,
>>>>> 
>>>>> J-D
>>>>> 
>>>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <[email protected]>
>>>> wrote:
>>>>>> Answers below...
>>>>>> 
>>>>>> Regards,
>>>>>> Seraph
>>>>>> 
>>>>>>> From: stack <[email protected]>
>>>>>>> Reply-To: <[email protected]>
>>>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800
>>>>>>> To: <[email protected]>
>>>>>>> Subject: Re: Hbase pausing problems
>>>>>>> 
>>>>>>> How many CPUs?
>>>>>> 
>>>>>> 1x Quad Xeon in each server
>>>>>> 
>>>>>>> 
>>>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh).
>>>>  You
>>>>>>> might want to enable GC logging.  See the line after hbase-env.sh.
>>>>  Enable
>>>>>>> it.  GC logging might tell you about the pauses you are seeing.
>>>>>> 
>>>>>> I will enable GC Logging tonight during our slow time because restarting
>>>> the
>>>>>> regionservers causes the clients to pause indefinitely.
>>>>>> 
>>>>>>> 
>>>>>>> Can you get a fourth server for your cluster and run the master, zk,
>>>> and
>>>>>>> namenode on it and leave the other three servers for regionserver and
>>>>>>> datanode (with perhaps replication == 2 as per J-D to lighten load on
>>>> small
>>>>>>> cluster).
>>>>>> 
>>>>>> We plan to double the number of servers in the next few weeks and I will
>>>>>> take your advice to put the master, zk and namenode on it (we will need
>>>> to
>>>>>> have a second one on standby should this one crash).  The servers will
>>>> be
>>>>>> ordered shortly and will be here in a week or two.
>>>>>> 
>>>>>> That said, I have been monitoring CPU usage and none of them seem
>>>>>> particularly busy.  The regionserver on each one hovers around 30% all
>>>> the
>>>>>> time and the datanode sits at about 10% most of the time.  If we do have
>>>> a
>>>>>> resource issue, it definitely does not seem to be CPU.
>>>>>> 
>>>>>> Increasing RAM did not seem to work either - it just made hBase use a
>>>> bigger
>>>>>> memstore and then it took longer to do a flush.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> More notes inline in below.
>>>>>>> 
>>>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>>> Approximately every 10 minutes, our entire coldfusion system pauses at
>>>> the
>>>>>>>> point of inserting into hBase for between 30 and 60 seconds and then
>>>>>>>> continues.
>>>>>>>> 
>>>>>>>> Yeah, enable GC logging.  See if you can make correlation between the
>>>> pause
>>>>>>> the client is seeing and a GC pause.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Investigation...
>>>>>>>> 
>>>>>>>> Watching the logs of the regionserver, the pausing of the coldfusion
>>>> system
>>>>>>>> happens as soon as one of the regionservers starts flushing the
>>>> memstore
>>>>>>>> and
>>>>>>>> recovers again as soon as it is finished flushing (recovers as soon as
>>>> it
>>>>>>>> starts compacting).
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ...though, this would seem to point to an issue with your hardware.
>>>>  How
>>>>>>> many disks?  Are they misconfigured such that they hold up the system
>>>> when
>>>>>>> they are being heavily written to?
>>>>>>> 
>>>>>>> 
>>>>>>> A regionserver log at DEBUG from around this time so we could look at
>>>> it
>>>>>>> would be helpful.
>>>>>>> 
>>>>>>> 
>>>>>>> I can recreate the error just by stopping 1 of the regionservers; but
>>>> then
>>>>>>>> starting the regionserver again does not make coldfusion recover until
>>>> I
>>>>>>>> restart the coldfusion servers.  It is important to note that if I
>>>> keep the
>>>>>>>> built in hBase shell running, it is happily able to put and get data
>>>> to and
>>>>>>>> from hBase whilst coldfusion is busy pausing/failing.
>>>>>>>> 
>>>>>>> 
>>>>>>> This seems odd.  Enable DEBUG for the client-side.  Do you see the
>>>> shell
>>>>>>> recalibrating finding new locations for regions after you shutdown the
>>>>>>> single regionserver, something that your coldfusion is not doing?  Or,
>>>>>>> maybe, the shell is putting a regionserver that has not been disturbed
>>>> by
>>>>>>> your start/stop?
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> I have tried increasing the regionserver¹s RAM to 3 Gigs and this just
>>>> made
>>>>>>>> the problem worse because it took longer for the regionservers to
>>>> flush the
>>>>>>>> memory store.
>>>>>>> 
>>>>>>> 
>>>>>>> Again, if flushing is holding up the machine, if you can't write a file
>>>> in
>>>>>>> background without it freezing your machine, then your machines are
>>>> anemic
>>>>>>> or misconfigured?
>>>>>>> 
>>>>>>> 
>>>>>>>> One of the links I found on your site mentioned increasing
>>>>>>>> the default value for hbase.regionserver.handler.count to 100  this
>>>> did
>>>>>>>> not
>>>>>>>> seem to make any difference.
>>>>>>> 
>>>>>>> 
>>>>>>> Leave this configuration in place I'd say.
>>>>>>> 
>>>>>>> Are you seeing 'blocking' messages in the regionserver logs?
>>>>  Regionserver
>>>>>>> will stop taking on writes if it thinks its being overrun to prevent
>>>> itself
>>>>>>> OOME'ing.  Grep the 'multiplier' configuration in hbase-default.xml.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> I have double checked that the memory flush
>>>>>>>> very rarely happens on more than 1 regionserver at a time  in fact in
>>>> my
>>>>>>>> many hours of staring at tails of logs, it only happened once where
>>>> two
>>>>>>>> regionservers flushed at the same time.
>>>>>>>> 
>>>>>>>> You've enabled DEBUG?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> My investigations point strongly towards a coding problem on our side
>>>>>>>> rather
>>>>>>>> than a problem with the server setup or hBase itself.
>>>>>>> 
>>>>>>> 
>>>>>>> If things were slow from client-perspective, that might be a
>>>> client-side
>>>>>>> coding problem but these pauses, unless you have a fly-by deadlock in
>>>> your
>>>>>>> client-code, its probably an hbase issue.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>>  I say this because
>>>>>>>> whilst I understand why a regionserver would go offline during a
>>>> memory
>>>>>>>> flush, I would expect the other two regionservers to pick up the load
>>>> 
>>>>>>>> especially since the built-in hbase shell has no problem accessing
>>>> hBase
>>>>>>>> whilst a regionserver is busy doing a memstore flush.
>>>>>>>> 
>>>>>>>> HBase does not go offline during memory flush.  It continues to be
>>>>>>> available for reads and writes during this time.  And see J-D response
>>>> for
>>>>>>> incorrect understanding of how loading of regions is done in an hbase
>>>>>>> cluster.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ...
>>>>>>> 
>>>>>>> 
>>>>>>> I think either I am leaving out code that is required to determine
>>>> which
>>>>>>>> RegionServers are available OR I am keeping too many hBase objects in
>>>> RAM
>>>>>>>> instead of calling their constructors each time (my purpose obviously
>>>> was
>>>>>>>> to
>>>>>>>> improve performance).
>>>>>>>> 
>>>>>>>> 
>>>>>>> For sure keep single instance of HBaseConfiguration at least and use
>>>> this
>>>>>>> constructing all HTable and HBaseAdmin instances.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Currently the live system is inserting over 7 Million records per day
>>>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high load.
>>>>>>>> 
>>>>>>>> 
>>>>>>> What size are the records?   What is your table schema?  How many
>>>> regions do
>>>>>>> you currently have in your table?
>>>>>>> 
>>>>>>>  St.Ack
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> 
>> 
>> 
>>
Re: Hbase pausing problems

Reply via email to