On Wed, Jan 20, 2010 at 11:46 AM, Seraph Imalia <[email protected]> wrote:

>
> > From: stack <[email protected]>
> > Reply-To: <[email protected]>
> > Date: Wed, 20 Jan 2010 11:26:00 -0800
> > To: <[email protected]>
> > Subject: Re: Hbase pausing problems
> >
> > Looking at logs, what J-D says regards the number of regions you are
> > carrying per server (800).  Enable compression and that'll shrink the
> number
> > and probably up your throughput all around or make your regions larger
> (make
> > sure to up the memstore size in sympathy).
>
> Cool - I was tempted to enable compression - I saw a wiki entry explaining
> how to enable lzo compression
> (http://wiki.apache.org/hadoop/UsingLzoCompression) - do you recommend I
> do
> this or just enable the hBase out-the-box Gzip compression?
>
> lzo is way to go.  gzip compresses better but lzo is near frictionless.



> How does this work with existing tables?  I see that compression is a flag
> when creating a new table.
>
>
You enable it.  You'll have to offline tables to enable it.  On next
compaction, your data will be compressed.

Be sure to do install on all nodes.  There is a tool referenced on that wiki
page.  Run it to ensure lzo successfully installed all over.   Its not a
graceful failure if incorrectly installed.


> >
> > Your keys look like UUIDs so are probably pretty well spread over the key
> > space would be my guess -- that you are not beating up on one region
> > continuously (the J-D painted first scenario is probably whats
> happening).
> >
> > I thought I could see what your schema was by looking in logs but thats
> no
> > longer the case so please tell us more about it -- number of column
> > families, what one of your 6M inserts is comprised of.
>
> This is from the master UI - please let me know if you need more?  The sURL
> column is no longer used - some old data still has that column.
>
> User Tables
>
> AdDelivery  -  {NAME => 'AdDelivery', FAMILIES => [{NAME =>
> 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID =>
> '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid => '',
> dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
> uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
> '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
> => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
> '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
> '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
> COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
> dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}
>
>
> AdDeliveryNonProfit  -  {NAME => 'AdDeliveryNonProfit', FAMILIES => [{NAME
> => 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID
> => '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid =>
> '', dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '',
> uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID =>
> '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition
> => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL =>
> '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID =>
> '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3',
> COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '',
> dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]}
>
>
> ChannelDelivery  -  {NAME => 'ChannelDelivery', FAMILIES => [{NAME =>
> 'ChannelDelivery_Family', iNumberOfAds => '', iCityID => '', TTL =>
> '2147483647', iChannelID => '', IN_MEMORY => 'false', sURL => '',
> sRegionCode => '', VERSIONS => '3', COMPRESSION => 'NONE', dtDelivered =>
> '', sCity => '', iWebsiteID => '', uidChannel => '', sClientIPAddress =>
> '',
> iPublisherID => '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', iURLID =>
> '', sCountryCode => ''}]}
>
>
The above look good.  One family per table.  Its interesting that you are
using the schema map adding your own attributes.  Do you make use of that
facility?



> >
> > Would suggest you not run hbase as root if you can avoid it.
>
> Is there any reason other than for security reasons?
>
> Thats the reason. Thats usually enough.  And that if hadoop goes awry as
root... But your hw, your prerogative (smile)

I have a lot of settings to try now :)... Compression, memstore upper/lower
> limits, GC Logs and heap to 3G.  Thank you for your help and suggestions.
>  I
> will try these things tonight and report back to you in the morning.
>
>
I'd suggest do one at a time.  Do the J-D 0.5/0.48 first.  That should have
biggest impact.  The fact that you have so many regions is aggrevating the
global block issue so you should enable compression but do that second.

St.Ack

Reply via email to