On Wed, Jan 20, 2010 at 11:46 AM, Seraph Imalia <[email protected]> wrote:
> > > From: stack <[email protected]> > > Reply-To: <[email protected]> > > Date: Wed, 20 Jan 2010 11:26:00 -0800 > > To: <[email protected]> > > Subject: Re: Hbase pausing problems > > > > Looking at logs, what J-D says regards the number of regions you are > > carrying per server (800). Enable compression and that'll shrink the > number > > and probably up your throughput all around or make your regions larger > (make > > sure to up the memstore size in sympathy). > > Cool - I was tempted to enable compression - I saw a wiki entry explaining > how to enable lzo compression > (http://wiki.apache.org/hadoop/UsingLzoCompression) - do you recommend I > do > this or just enable the hBase out-the-box Gzip compression? > > lzo is way to go. gzip compresses better but lzo is near frictionless. > How does this work with existing tables? I see that compression is a flag > when creating a new table. > > You enable it. You'll have to offline tables to enable it. On next compaction, your data will be compressed. Be sure to do install on all nodes. There is a tool referenced on that wiki page. Run it to ensure lzo successfully installed all over. Its not a graceful failure if incorrectly installed. > > > > Your keys look like UUIDs so are probably pretty well spread over the key > > space would be my guess -- that you are not beating up on one region > > continuously (the J-D painted first scenario is probably whats > happening). > > > > I thought I could see what your schema was by looking in logs but thats > no > > longer the case so please tell us more about it -- number of column > > families, what one of your 6M inserts is comprised of. > > This is from the master UI - please let me know if you need more? The sURL > column is no longer used - some old data still has that column. > > User Tables > > AdDelivery - {NAME => 'AdDelivery', FAMILIES => [{NAME => > 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID => > '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid => '', > dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '', > uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID => > '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition > => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL => > '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID => > '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3', > COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '', > dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]} > > > AdDeliveryNonProfit - {NAME => 'AdDeliveryNonProfit', FAMILIES => [{NAME > => 'AdDelivery_Family', iAdvertiserID => '', iCampaignTypeID => '', iCityID > => '', mPublisherActualCost => '', iAdID => '', iKeywordID => '', mBid => > '', dtDelivered => '', iWebsiteID => '', sCity => '', iCurrencyID => '', > uidChannel => '', sClientIPAddress => '', iPublisherID => '', iAdGroupID => > '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', fRandRate => '', iPosition > => '', sKeyword => '', dtClicked => '', mActualCost => '', TTL => > '2147483647', iPublisherCurrencyID => '', iCampaignID => '', iChannelID => > '', IN_MEMORY => 'false', sURL => '', sRegionCode => '', VERSIONS => '3', > COMPRESSION => 'NONE', bConverted => '', fPublisherRandRate => '', > dtConverted => '', bClicked => '', iURLID => '', sCountryCode => ''}]} > > > ChannelDelivery - {NAME => 'ChannelDelivery', FAMILIES => [{NAME => > 'ChannelDelivery_Family', iNumberOfAds => '', iCityID => '', TTL => > '2147483647', iChannelID => '', IN_MEMORY => 'false', sURL => '', > sRegionCode => '', VERSIONS => '3', COMPRESSION => 'NONE', dtDelivered => > '', sCity => '', iWebsiteID => '', uidChannel => '', sClientIPAddress => > '', > iPublisherID => '', BLOCKSIZE => '65536', BLOCKCACHE => 'true', iURLID => > '', sCountryCode => ''}]} > > The above look good. One family per table. Its interesting that you are using the schema map adding your own attributes. Do you make use of that facility? > > > > Would suggest you not run hbase as root if you can avoid it. > > Is there any reason other than for security reasons? > > Thats the reason. Thats usually enough. And that if hadoop goes awry as root... But your hw, your prerogative (smile) I have a lot of settings to try now :)... Compression, memstore upper/lower > limits, GC Logs and heap to 3G. Thank you for your help and suggestions. > I > will try these things tonight and report back to you in the morning. > > I'd suggest do one at a time. Do the J-D 0.5/0.48 first. That should have biggest impact. The fact that you have so many regions is aggrevating the global block issue so you should enable compression but do that second. St.Ack
