Hi Guys,
I am having another problem with hBase that is probably related to the problems
I was emailing you about earlier this year.
I have finally had a chance to at least try one of the suggestions you had to
help resolve our problems. I increased the heap size per server to 3Gig and
added the following to the hbase-site.xml files on each server last night (I
have not enabled compression yet for fear of loosing data - I need to wait for
when I have a long period of time where hBase can be offline for and for incase
there are problems I need to resolve) ...
<property>
<name>hbase.regionserver.global.memstore.upperLimit</name>
<value>0.5</value>
<description>Maximum size of all memstores in a region server before new
updates are blocked and flushes are forced. Defaults to 40% of heap
</description>
</property>
<property>
<name>hbase.regionserver.global.memstore.lowerLimit</name>
<value>0.48</value>
<description>When memstores are being forced to flush to make room in
memory, keep flushing until we hit this mark. Defaults to 30% of heap.
This value equal to hbase.regionserver.global.memstore.upperLimit causes
the minimum possible flushing to occur when updates are blocked due to
memstore limiting.
</description>
</property>
...and then restarted hbase
bin/stop-hbase.sh
bin/start-hbase.sh
Hbase spent about 30 minutes assigning regions to each of the region servers
(we now have 2595 regions). When it had finished (which is usually when our
clients apps are able to start adding rows), client apps were only able to add
rows at an incredibly slow rate (about 1 every second) which was not even able
to cope with the miniscule load we have at 3AM in the morning.
I left hBase for about 30 minutes after region assignment had completed and the
situation did not improve. I then tried changing the lowerLimit to 0.38 and
restart again which also did not improve the situation. I then removed the
above lines by commenting them out (<!-- -->) and restarted hBase again.
Again, 30 minutes later after it had finished assigning regions, it was no
different.
I therefore assumed that the problem was not caused by the addition of the
properties but rather just by the fact that it had been restarted. I checked
the log files very closely and I noticed that when I disable the client apps,
the regionservers are frantically requesting major compactions and complaining
about too many store files for a region.
I then assumed that the system is under strain performing houskeeping and there
is nothing I can do with my limited knowledge to improve it without contacting
you guys about it first. It was 4AM this morning and I had no choice but to do
whatever I could to get our client apps up and running before morning, so I
wrote some quick coldfusion and java code to get the data inserted into local
mysql servers so that hBase could have time to do whatever it was doing.
It is still compacting and it is now 9 hours after the last restart. With 0
load from client apps.
Please can you assist by shedding some light on what is actually happening? -
Is my thinking correct? - Is it related to the "hBase pausing problems" we are
still having? - What do I do to fix it or make it hurry up?
Regards,
Seraph
> From: stack <[email protected]>
> Reply-To: <[email protected]>
> Date: Wed, 20 Jan 2010 12:03:01 -0800
> To: <[email protected]>
> Subject: Re: Hbase pausing problems
>
> On Wed, Jan 20, 2010 at 11:46 AM, Seraph Imalia <[email protected]> wrote:
>
>>
>>> From: stack <[email protected]>
>>> Reply-To: <[email protected]>
>>> Date: Wed, 20 Jan 2010 11:26:00 -0800
>>> To: <[email protected]>
>>> Subject: Re: Hbase pausing problems
>>>
>>> Looking at logs, what J-D says regards the number of regions you are
>>> carrying per server (800). Enable compression and that'll shrink the
>> number
>>> and probably up your throughput all around or make your regions larger
>> (make
>>> sure to up the memstore size in sympathy).
>>
>> Cool - I was tempted to enable compression - I saw a wiki entry explaining
>> how to enable lzo compression
>> (http://wiki.apache.org/hadoop/UsingLzoCompression) - do you recommend I
>> do
>> this or just enable the hBase out-the-box Gzip compression?
>>
>> lzo is way to go. gzip compresses better but lzo is near frictionless.
>
>
>
>> How does this work with existing tables? I see that compression is a flag
>> when creating a new table.
>>
>>
> You enable it. You'll have to offline tables to enable it. On next
> compaction, your data will be compressed.
>
> Be sure to do install on all nodes. There is a tool referenced on that wiki
> page. Run it to ensure lzo successfully installed all over. Its not a
> graceful failure if incorrectly installed.
>
>
>>>
>>> Your keys look like UUIDs so are probably pretty well spread over the key
>>> space would be my guess -- that you are not beating up on one region
>>> continuously (the J-D painted first scenario is probably whats
>> happening).
>>>
>>> I thought I could see what your schema was by looking in logs but thats
>> no
>>> longer the case so please tell us more about it -- number of column
>>> families, what one of your 6M inserts is comprised of.
>>
>> This is from the master UI - please let me know if you need more? The sURL
>> column is no longer used - some old data still has that column.
>>
>> User Tables
>>
>> AdDelivery - {NAME => 'AdDelivery', FAMILIES => [{NAME =>
>> 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID =>
>> '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid => '',
>> dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
>> uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
>> '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
>> => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
>> '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
>> '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
>> COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
>> dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}
>>
>>
>> AdDeliveryNonProfit - {NAME => 'AdDeliveryNonProfit', FAMILIES => [{NAME
>> => 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID
>> => '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid =>
>> '', dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
>> uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
>> '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
>> => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
>> '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
>> '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
>> COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
>> dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}
>>
>>
>> ChannelDelivery - {NAME => 'ChannelDelivery', FAMILIES => [{NAME =>
>> 'ChannelDelivery_Family', iNumberOfAds => '', iCityID => '', TTL =>
>> '2147483647', iChannelID => '', IN_MEMORY => 'false', sURL => '',
>> sRegionCode => '', VERSIONS => '3', COMPRESSION => 'NONE', dtDelivered =>
>> '', sCity => '', iWebsiteID => '', uidChannel => '', sClientIPAddress =>
>> '',
>> iPublisherID => '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', iURLID =>
>> '', sCountryCode => ''}]}
>>
>>
> The above look good. One family per table. Its interesting that you are
> using the schema map adding your own attributes. Do you make use of that
> facility?
>
>
>
>>>
>>> Would suggest you not run hbase as root if you can avoid it.
>>
>> Is there any reason other than for security reasons?
>>
>> Thats the reason. Thats usually enough. And that if hadoop goes awry as
> root... But your hw, your prerogative (smile)
>
> I have a lot of settings to try now :)... Compression, memstore upper/lower
>> limits, GC Logs and heap to 3G. Thank you for your help and suggestions.
>> I
>> will try these things tonight and report back to you in the morning.
>>
>>
> I'd suggest do one at a time. Do the J-D 0.5/0.48 first. That should have
> biggest impact. The fact that you have so many regions is aggrevating the
> global block issue so you should enable compression but do that second.
>
> St.Ack