Re: Hbase pausing problems

Seraph Imalia Mon, 08 Feb 2010 05:36:51 -0800

Hi Guys,

I am having another problem with hBase that is probably related to the problems 
I was emailing you about earlier this year.


I have finally had a chance to at least try one of the suggestions you had to 
help resolve our problems.  I increased the heap size per server to 3Gig and 
added the following to the hbase-site.xml files on each server last night (I 
have not enabled compression yet for fear of loosing data - I need to wait for 
when I have a long period of time where hBase can be offline for and for incase 
there are problems I need to resolve) ...

<property>
    <name>hbase.regionserver.global.memstore.upperLimit</name>
    <value>0.5</value>
    <description>Maximum size of all memstores in a region server before new
      updates are blocked and flushes are forced. Defaults to 40% of heap
    </description>
</property>
<property>
    <name>hbase.regionserver.global.memstore.lowerLimit</name>
    <value>0.48</value>
    <description>When memstores are being forced to flush to make room in
      memory, keep flushing until we hit this mark. Defaults to 30% of heap.
      This value equal to hbase.regionserver.global.memstore.upperLimit causes
      the minimum possible flushing to occur when updates are blocked due to
      memstore limiting.
    </description>
</property>

...and then restarted hbase 
bin/stop-hbase.sh
bin/start-hbase.sh

Hbase spent about 30 minutes assigning regions to each of the region servers 
(we now have 2595 regions).  When it had finished (which is usually when our 
clients apps are able to start adding rows), client apps were only able to add 
rows at an incredibly slow rate (about 1 every second) which was not even able 
to cope with the miniscule load we have at 3AM in the morning. 

I left hBase for about 30 minutes after region assignment had completed and the 
situation did not improve.  I then tried changing the lowerLimit to 0.38 and 
restart again which also did not improve the situation.  I then removed the 
above lines by commenting them out (<!-- -->) and restarted hBase again.  
Again, 30 minutes later after it had finished assigning regions, it was no 
different.

I therefore assumed that the problem was not caused by the addition of the 
properties but rather just by the fact that it had been restarted.  I checked 
the log files very closely and I noticed that when I disable the client apps, 
the regionservers are frantically requesting major compactions and complaining 
about too many store files for a region.  

I then assumed that the system is under strain performing houskeeping and there 
is nothing I can do with my limited knowledge to improve it without contacting 
you guys about it first.  It was 4AM this morning and I had no choice but to do 
whatever I could to get our client apps up and running before morning, so I 
wrote some quick coldfusion and java code to get the data inserted into local 
mysql servers so that hBase could have time to do whatever it was doing.

It is still compacting and it is now 9 hours after the last restart. With 0 
load from client apps.

Please can you assist by shedding some light on what is actually happening? - 
Is my thinking correct? - Is it related to the "hBase pausing problems" we are 
still having? - What do I do to fix it or make it hurry up?

Regards,
Seraph



> From: stack <[email protected]>
> Reply-To: <[email protected]>
> Date: Wed, 20 Jan 2010 12:03:01 -0800
> To: <[email protected]>
> Subject: Re: Hbase pausing problems
> 
> On Wed, Jan 20, 2010 at 11:46 AM, Seraph Imalia <[email protected]> wrote:
> 
>> 
>>> From: stack <[email protected]>
>>> Reply-To: <[email protected]>
>>> Date: Wed, 20 Jan 2010 11:26:00 -0800
>>> To: <[email protected]>
>>> Subject: Re: Hbase pausing problems
>>> 
>>> Looking at logs, what J-D says regards the number of regions you are
>>> carrying per server (800).  Enable compression and that'll shrink the
>> number
>>> and probably up your throughput all around or make your regions larger
>> (make
>>> sure to up the memstore size in sympathy).
>> 
>> Cool - I was tempted to enable compression - I saw a wiki entry explaining
>> how to enable lzo compression
>> (http://wiki.apache.org/hadoop/UsingLzoCompression) - do you recommend I
>> do
>> this or just enable the hBase out-the-box Gzip compression?
>> 
>> lzo is way to go.  gzip compresses better but lzo is near frictionless.
> 
> 
> 
>> How does this work with existing tables?  I see that compression is a flag
>> when creating a new table.
>> 
>> 
> You enable it.  You'll have to offline tables to enable it.  On next
> compaction, your data will be compressed.
> 
> Be sure to do install on all nodes.  There is a tool referenced on that wiki
> page.  Run it to ensure lzo successfully installed all over.   Its not a
> graceful failure if incorrectly installed.
> 
> 
>>> 
>>> Your keys look like UUIDs so are probably pretty well spread over the key
>>> space would be my guess -- that you are not beating up on one region
>>> continuously (the J-D painted first scenario is probably whats
>> happening).
>>> 
>>> I thought I could see what your schema was by looking in logs but thats
>> no
>>> longer the case so please tell us more about it -- number of column
>>> families, what one of your 6M inserts is comprised of.
>> 
>> This is from the master UI - please let me know if you need more?  The sURL
>> column is no longer used - some old data still has that column.
>> 
>> User Tables
>> 
>> AdDelivery  -  {NAME => 'AdDelivery', FAMILIES => [{NAME =>
>> 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID =>
>> '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid => '',
>> dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
>> uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
>> '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
>> => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
>> '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
>> '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
>> COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
>> dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}
>> 
>> 
>> AdDeliveryNonProfit  -  {NAME => 'AdDeliveryNonProfit', FAMILIES => [{NAME
>> => 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID
>> => '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid =>
>> '', dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
>> uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
>> '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
>> => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
>> '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
>> '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
>> COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
>> dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}
>> 
>> 
>> ChannelDelivery  -  {NAME => 'ChannelDelivery', FAMILIES => [{NAME =>
>> 'ChannelDelivery_Family', iNumberOfAds => '', iCityID => '', TTL =>
>> '2147483647', iChannelID => '', IN_MEMORY => 'false', sURL => '',
>> sRegionCode => '', VERSIONS => '3', COMPRESSION => 'NONE', dtDelivered =>
>> '', sCity => '', iWebsiteID => '', uidChannel => '', sClientIPAddress =>
>> '',
>> iPublisherID => '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', iURLID =>
>> '', sCountryCode => ''}]}
>> 
>> 
> The above look good.  One family per table.  Its interesting that you are
> using the schema map adding your own attributes.  Do you make use of that
> facility?
> 
> 
> 
>>> 
>>> Would suggest you not run hbase as root if you can avoid it.
>> 
>> Is there any reason other than for security reasons?
>> 
>> Thats the reason. Thats usually enough.  And that if hadoop goes awry as
> root... But your hw, your prerogative (smile)
> 
> I have a lot of settings to try now :)... Compression, memstore upper/lower
>> limits, GC Logs and heap to 3G.  Thank you for your help and suggestions.
>>  I
>> will try these things tonight and report back to you in the morning.
>> 
>> 
> I'd suggest do one at a time.  Do the J-D 0.5/0.48 first.  That should have
> biggest impact.  The fact that you have so many regions is aggrevating the
> global block issue so you should enable compression but do that second.
> 
> St.Ack

Re: Hbase pausing problems

Reply via email to