Re: Hbase pausing problems

Seraph Imalia Wed, 20 Jan 2010 11:47:03 -0800

> From: stack <[email protected]>
> Reply-To: <[email protected]>
> Date: Wed, 20 Jan 2010 11:26:00 -0800
> To: <[email protected]>
> Subject: Re: Hbase pausing problems
> 
> Looking at logs, what J-D says regards the number of regions you are
> carrying per server (800).  Enable compression and that'll shrink the number
> and probably up your throughput all around or make your regions larger (make
> sure to up the memstore size in sympathy).

Cool - I was tempted to enable compression - I saw a wiki entry explaining
how to enable lzo compression
(http://wiki.apache.org/hadoop/UsingLzoCompression) - do you recommend I do
this or just enable the hBase out-the-box Gzip compression?

How does this work with existing tables?  I see that compression is a flag
when creating a new table.

> 
> Your keys look like UUIDs so are probably pretty well spread over the key
> space would be my guess -- that you are not beating up on one region
> continuously (the J-D painted first scenario is probably whats happening).
> 
> I thought I could see what your schema was by looking in logs but thats no
> longer the case so please tell us more about it -- number of column
> families, what one of your 6M inserts is comprised of.

This is from the master UI - please let me know if you need more?  The sURL
column is no longer used - some old data still has that column.

User Tables

AdDelivery  -  {NAME => 'AdDelivery', FAMILIES => [{NAME =>
'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID =>
'', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid => '',
dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
'', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
=> '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
'2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
'', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}


AdDeliveryNonProfit  -  {NAME => 'AdDeliveryNonProfit', FAMILIES => [{NAME
=> 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID
=> '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid =>
'', dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
'', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
=> '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
'2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
'', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}


ChannelDelivery  -  {NAME => 'ChannelDelivery', FAMILIES => [{NAME =>
'ChannelDelivery_Family', iNumberOfAds => '', iCityID => '', TTL =>
'2147483647', iChannelID => '', IN_MEMORY => 'false', sURL => '',
sRegionCode => '', VERSIONS => '3', COMPRESSION => 'NONE', dtDelivered =>
'', sCity => '', iWebsiteID => '', uidChannel => '', sClientIPAddress => '',
iPublisherID => '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', iURLID =>
'', sCountryCode => ''}]}

> 
> Would suggest you not run hbase as root if you can avoid it.

Is there any reason other than for security reasons?

> 
> I'll leave it at this for now.   Chatting with J-D about this issue, given
> that you are using UUIDs so load is being spread nice an even across your
> cluster, you should try out his suggested 0.5/0.48 settings on 3G or RAM.
> 

I have a lot of settings to try now :)... Compression, memstore upper/lower
limits, GC Logs and heap to 3G.  Thank you for your help and suggestions.  I
will try these things tonight and report back to you in the morning.

Regards,
Seraph

> St.Ack
> 
> On Wed, Jan 20, 2010 at 10:55 AM, Jean-Daniel Cryans
> <[email protected]>wrote:
> 
>> A table is sorted by row key and all the regions are sequentially
>> split so that a row will always go to a single region and if that
>> region is unavailable for some reason then you can't write
>> immediately. If your write pattern is distributed among the regions,
>> they will all slowly synchronize on the hung region. This is probably
>> why the writes stops.
>> 
>> Or, if you are always writing row keys sequentially then it's even
>> _worse_ because all writes will always go to the same region so
>> there's no load distribution at all. Example: incrementing row key.
>> 
>> You also seem to have a lot of regions per region server which plays
>> in the global memstore size.
>> 
>> Finally, I recommend setting:
>> 
>> heap to 3G
>> hbase.regionserver.global.memstore.upperLimit to 0.5
>> hbase.regionserver.global.memstore.lowerLimit to 0.48
>> 
>> This should help a lot.
>> 
>> Thx,
>> 
>> J-D
>> 
>> On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <[email protected]> wrote:
>>> 
>>> 
>>> 
>>>> From: stack <[email protected]>
>>>> Reply-To: <[email protected]>
>>>> Date: Wed, 20 Jan 2010 07:26:58 -0800
>>>> To: <[email protected]>
>>>> Subject: Re: Hbase pausing problems
>>>> 
>>>> On Wed, Jan 20, 2010 at 1:06 AM, Seraph Imalia <[email protected]>
>> wrote:
>>>> 
>>>>> 
>>>>> The client stops being able to write to hBase as soon as 1 of the
>>>>> regionservers starts doing this...
>>>>> 
>>>>> 2010-01-17 01:16:25,729 INFO
>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing
>> of
>>>>> ChannelDelivery,5352f559-d68e-42e9-be92-8bae82185ed1,1262544772804
>> because
>>>>> global memstore limit of 396.7m exceeded; currently 396.7m and flushing
>>>>> till
>>>>> 247.9m
>>>>> 
>>>>> See hbase.regionserver.global.memstore.upperLimit and
>>>> hbase.regionserver.global.memstore.lowerLimit.  The former is a
>> prophylactic
>>>> against OOME'ing.  The sum of all memory used by MemStores is not
>> allowed to
>>>> grow beyond 0.4 of total heap size (0.4 is default).  The 247.9M figure
>> in
>>>> the above is 0.25 of the heap by default.  Writes are held up until
>>>> sufficient MemStore space has been dumped by flushing.  You seem to be
>>>> taking on writes at a rate that is in excess of the rate at which you
>> can
>>>> flush.  We'll take a lok at your logs..... You might up the 0.25 to 0.3
>> or
>>>> 0.32.  This will shorten the times we stop taking on writes but at the
>> cost
>>>> of increasing the number of times we disallow writes.
>>> 
>>> Does this mean that when 1 regionserver does a memstore flush, the other
>> two
>>> regionservers are also unavailable for writes?  I have watched the logs
>>> carefully to make sure that not all the regionservers are flushing at the
>>> same time.  Most of the time, only 1 server flushes at a time and in rare
>>> cases, I have seen two at a time.
>>> 
>>>> 
>>>> It also looks like you have little RAM space given over to hbase, just
>> 1G?
>>>> If your traffic is bursty, giving hbase more RAM might help it get over
>>>> these write humps.
>>> 
>>> I have it at 1G on purpose.  When we first had the problem, I immediately
>>> thought the problem was resource related, so I increased the hBase RAM to
>> 3G
>>> (each server has 8G - I was carefull to watch for swapping).  This made
>> the
>>> problem worse because each memstore flush took longer which stopped
>> writing
>>> for longer and people started noticing that our system was down during
>> those
>>> periods.  Granted, the period between flushes was longer, but the effect
>> was
>>> that people started to notice our downtime.  So I have put the RAM back
>> down
>>> to 1G to minimize the negative effects on the live system and less people
>>> notice it.
>>> 
>>> 
>>>> 
>>>> 
>>>> 
>>>>> Or this...
>>>>> 
>>>>> 2010-01-17 01:16:26,159 INFO
>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing
>> of
>>>>> AdDelivery,613a401d-fb8a-42a9-aac6-d957f6281035,1261867806692 because
>>>>> global
>>>>> memstore limit of 396.7m exceeded; currently 390.4m and flushing till
>>>>> 247.9m
>>>>> 
>>>>> This is a by-product of the above hitting 'global limit'.
>>>> 
>>>> 
>>>> 
>>>>> And then as soon as it finishes that, it starts doing this...
>>>>> 
>>>>> 2010-01-17 01:16:36,709 DEBUG
>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
>>>>> requested for region
>>>>> AdDelivery,fb98f6c9-db13-4853-92ee-ffe1182fffd0,1263544763046/350999600
>>>>> because: regionserver/192.168.2.88:60020.cacheFlusher
>>>>> 
>>>>> These are 'normal'  We are logging fact that a compaction has been
>>>> requested on a region.  This does not get in the way of our taking on
>> writes
>>>> (not directly).
>>>> 
>>>> 
>>>> 
>>>>> And as soon as it has finished the last of the Compaction Requests, the
>>>>> client recovers and the regionserver starts doing this...
>>>>> 
>>>>> 2010-01-17 01:16:36,713 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Compaction size of ChannelDelivery_Family: 209.5m; Skipped 1 file(s),
>> size:
>>>>> 216906650
>>>>> 2010-01-17 01:16:36,713 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Started compaction of 3 file(s)  into
>>>>> /hbase/ChannelDelivery/compaction.dir/165262792, seqid=1241653592
>>>>> 2010-01-17 01:16:37,143 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Completed compaction of ChannelDelivery_Family; new storefile is
>>>>> 
>>>>> 
>> hdfs://dynobuntu6:8020/hbase/ChannelDelivery/165262792/ChannelDelivery_Famil
>>>>> y/1673693545539520912; store size is 209.5m
>>>>> 
>>>> 
>>>> Above is 'normal'.  At DEBUG you see detail on hbase going about its
>>>> business.
>>>> 
>>>> 
>>>>> 
>>>>> All of these logs seem perfectly acceptable to me - the problem is that
>> it
>>>>> just requires one of the regionservers to start doing this for the
>> client
>>>>> to
>>>>> be prevented from inserting new rows into hBase.  The logs don't seem
>> to
>>>>> explain why this is happening.
>>>>> 
>>>>> 
>>>> Clients will be blocked writing regions carried by the effected
>> regionserver
>>>> only.  Your HW is not appropriate to the load as currently setup.  You
>> might
>>>> also consider adding more machines to your cluster.
>>>> 
>>> 
>>> Hmm... How does hBase decide which region to write to?  Is it possible
>> that
>>> hBase is deciding to write all our current records to one specific region
>>> that happens to be on the server that is busy doing a memstore flush?
>>> 
>>> We are currently inserting about 6 million rows per day.  SQL Server
>> (which
>>> I am so happy to no longer be using for this) was able to write (and
>>> replicate to a slave) 9 million records (using the same spec'ed server).
>>  I
>>> would like to see hBase cope with the 3 we have given it at least when
>>> inserting 6 million.  Do you think this is possible or is our only answer
>> to
>>> throw on more servers?
>>> 
>>> Seraph
>>> 
>>>> St.Ack
>>>> 
>>>> 
>>>> 
>>>>> Thank you for your assistance thus far; please let me know if you need
>> or
>>>>> discover anything else?
>>>>> 
>>>>> Regards,
>>>>> Seraph
>>>>> 
>>>>> 
>>>>> 
>>>>>> From: Jean-Daniel Cryans <[email protected]>
>>>>>> Reply-To: <[email protected]>
>>>>>> Date: Mon, 18 Jan 2010 09:49:16 -0800
>>>>>> To: <[email protected]>
>>>>>> Subject: Re: Hbase pausing problems
>>>>>> 
>>>>>> The next step would be to take a look at your region server's log
>>>>>> around the time of the insert and clients who don't resume after the
>>>>>> loss of a region server. If you are able to gzip them and put them on
>>>>>> a public server, it would be awesome.
>>>>>> 
>>>>>> Thx,
>>>>>> 
>>>>>> J-D
>>>>>> 
>>>>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <[email protected]>
>>>>> wrote:
>>>>>>> Answers below...
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Seraph
>>>>>>> 
>>>>>>>> From: stack <[email protected]>
>>>>>>>> Reply-To: <[email protected]>
>>>>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800
>>>>>>>> To: <[email protected]>
>>>>>>>> Subject: Re: Hbase pausing problems
>>>>>>>> 
>>>>>>>> How many CPUs?
>>>>>>> 
>>>>>>> 1x Quad Xeon in each server
>>>>>>> 
>>>>>>>> 
>>>>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh).
>>>>>  You
>>>>>>>> might want to enable GC logging.  See the line after hbase-env.sh.
>>>>>  Enable
>>>>>>>> it.  GC logging might tell you about the pauses you are seeing.
>>>>>>> 
>>>>>>> I will enable GC Logging tonight during our slow time because
>> restarting
>>>>> the
>>>>>>> regionservers causes the clients to pause indefinitely.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Can you get a fourth server for your cluster and run the master, zk,
>>>>> and
>>>>>>>> namenode on it and leave the other three servers for regionserver
>> and
>>>>>>>> datanode (with perhaps replication == 2 as per J-D to lighten load
>> on
>>>>> small
>>>>>>>> cluster).
>>>>>>> 
>>>>>>> We plan to double the number of servers in the next few weeks and I
>> will
>>>>>>> take your advice to put the master, zk and namenode on it (we will
>> need
>>>>> to
>>>>>>> have a second one on standby should this one crash).  The servers
>> will
>>>>> be
>>>>>>> ordered shortly and will be here in a week or two.
>>>>>>> 
>>>>>>> That said, I have been monitoring CPU usage and none of them seem
>>>>>>> particularly busy.  The regionserver on each one hovers around 30%
>> all
>>>>> the
>>>>>>> time and the datanode sits at about 10% most of the time.  If we do
>> have
>>>>> a
>>>>>>> resource issue, it definitely does not seem to be CPU.
>>>>>>> 
>>>>>>> Increasing RAM did not seem to work either - it just made hBase use a
>>>>> bigger
>>>>>>> memstore and then it took longer to do a flush.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> More notes inline in below.
>>>>>>>> 
>>>>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Approximately every 10 minutes, our entire coldfusion system pauses
>> at
>>>>> the
>>>>>>>>> point of inserting into hBase for between 30 and 60 seconds and
>> then
>>>>>>>>> continues.
>>>>>>>>> 
>>>>>>>>> Yeah, enable GC logging.  See if you can make correlation between
>> the
>>>>> pause
>>>>>>>> the client is seeing and a GC pause.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Investigation...
>>>>>>>>> 
>>>>>>>>> Watching the logs of the regionserver, the pausing of the
>> coldfusion
>>>>> system
>>>>>>>>> happens as soon as one of the regionservers starts flushing the
>>>>> memstore
>>>>>>>>> and
>>>>>>>>> recovers again as soon as it is finished flushing (recovers as soon
>> as
>>>>> it
>>>>>>>>> starts compacting).
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ...though, this would seem to point to an issue with your hardware.
>>>>>  How
>>>>>>>> many disks?  Are they misconfigured such that they hold up the
>> system
>>>>> when
>>>>>>>> they are being heavily written to?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> A regionserver log at DEBUG from around this time so we could look
>> at
>>>>> it
>>>>>>>> would be helpful.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I can recreate the error just by stopping 1 of the regionservers;
>> but
>>>>> then
>>>>>>>>> starting the regionserver again does not make coldfusion recover
>> until
>>>>> I
>>>>>>>>> restart the coldfusion servers.  It is important to note that if I
>>>>> keep the
>>>>>>>>> built in hBase shell running, it is happily able to put and get
>> data
>>>>> to and
>>>>>>>>> from hBase whilst coldfusion is busy pausing/failing.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> This seems odd.  Enable DEBUG for the client-side.  Do you see the
>>>>> shell
>>>>>>>> recalibrating finding new locations for regions after you shutdown
>> the
>>>>>>>> single regionserver, something that your coldfusion is not doing?
>>  Or,
>>>>>>>> maybe, the shell is putting a regionserver that has not been
>> disturbed
>>>>> by
>>>>>>>> your start/stop?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I have tried increasing the regionserver¹s RAM to 3 Gigs and this
>> just
>>>>> made
>>>>>>>>> the problem worse because it took longer for the regionservers to
>>>>> flush the
>>>>>>>>> memory store.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Again, if flushing is holding up the machine, if you can't write a
>> file
>>>>> in
>>>>>>>> background without it freezing your machine, then your machines are
>>>>> anemic
>>>>>>>> or misconfigured?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> One of the links I found on your site mentioned increasing
>>>>>>>>> the default value for hbase.regionserver.handler.count to 100 
>> this
>>>>> did
>>>>>>>>> not
>>>>>>>>> seem to make any difference.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Leave this configuration in place I'd say.
>>>>>>>> 
>>>>>>>> Are you seeing 'blocking' messages in the regionserver logs?
>>>>>  Regionserver
>>>>>>>> will stop taking on writes if it thinks its being overrun to prevent
>>>>> itself
>>>>>>>> OOME'ing.  Grep the 'multiplier' configuration in hbase-default.xml.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I have double checked that the memory flush
>>>>>>>>> very rarely happens on more than 1 regionserver at a time  in fact
>> in
>>>>> my
>>>>>>>>> many hours of staring at tails of logs, it only happened once where
>>>>> two
>>>>>>>>> regionservers flushed at the same time.
>>>>>>>>> 
>>>>>>>>> You've enabled DEBUG?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> My investigations point strongly towards a coding problem on our
>> side
>>>>>>>>> rather
>>>>>>>>> than a problem with the server setup or hBase itself.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> If things were slow from client-perspective, that might be a
>>>>> client-side
>>>>>>>> coding problem but these pauses, unless you have a fly-by deadlock
>> in
>>>>> your
>>>>>>>> client-code, its probably an hbase issue.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>  I say this because
>>>>>>>>> whilst I understand why a regionserver would go offline during a
>>>>> memory
>>>>>>>>> flush, I would expect the other two regionservers to pick up the
>> load
>>>>> 
>>>>>>>>> especially since the built-in hbase shell has no problem accessing
>>>>> hBase
>>>>>>>>> whilst a regionserver is busy doing a memstore flush.
>>>>>>>>> 
>>>>>>>>> HBase does not go offline during memory flush.  It continues to be
>>>>>>>> available for reads and writes during this time.  And see J-D
>> response
>>>>> for
>>>>>>>> incorrect understanding of how loading of regions is done in an
>> hbase
>>>>>>>> cluster.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I think either I am leaving out code that is required to determine
>>>>> which
>>>>>>>>> RegionServers are available OR I am keeping too many hBase objects
>> in
>>>>> RAM
>>>>>>>>> instead of calling their constructors each time (my purpose
>> obviously
>>>>> was
>>>>>>>>> to
>>>>>>>>> improve performance).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> For sure keep single instance of HBaseConfiguration at least and use
>>>>> this
>>>>>>>> constructing all HTable and HBaseAdmin instances.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Currently the live system is inserting over 7 Million records per
>> day
>>>>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high
>> load.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> What size are the records?   What is your table schema?  How many
>>>>> regions do
>>>>>>>> you currently have in your table?
>>>>>>>> 
>>>>>>>>  St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>
Re: Hbase pausing problems

Reply via email to