RE: Outdated data can not be cleaned in time

Zhoushuaifeng Wed, 08 Jun 2011 21:15:39 -0700

Hi St,
My comments are below, start with //zhou.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Thursday, June 09, 2011 11:32 AM
To: [email protected]
Cc: Yanlijun; Chenjian
Subject: Re: Outdated data can not be cleaned in time

On Tue, Jun 7, 2011 at 12:41 AM, Zhoushuaifeng <[email protected]> wrote:
> https://issues.apache.org/jira/browse/HBASE-3723
>
> This issue is fixed and Committed to TRUNK, but not integrated in to 0.90.2 
> and 0.90.3, this will causing outdated data not be cleaned in time.

Let me commit to branch.  Its a small change.

Thanks, it's important. 

> For more, compaction checker will send regions to the compact queue to do 
> compact. But the priority of these regions is too low if these regions have 
> only a few storefiles. When there is large through output, and the compact 
> queue will aways have some regions with higher priority. This may causing the 
> major compact be delayed for a long time(even a few days),  and outdated data 
> cleaning will also be delayed.
> If so , I suggested that the compaction checker sending regions need major 
> compact to the compact queue with higher priority.
>

I'd think that a region with more storefiles should take priority over
regions with a few files, even if these files are due for a major
compaction. 

//zhou: I agree regions with more files should take higher priority, but there 
are other factors important should be considered. In our test case, we found 
some regions sent to the queue by major compact checker hunging in the queue 
for more than 2 days! Some scanners on these regions cannot get availably data 
for a long time and lease expired.
I think set these regions priority to hbase.hstore.blockingStoreFiles - 
hbase.hstore.compactionThreshold -1 as default may be a good way to solve this 
problem. If regions have less than 3 files, it's priority is lower than 
outdated regions, but if it has more than 4 files, it's priority will be 
higher. This settings can both solve the outdated problem and will not block 
flush and put. 

 I can understand that if there are a lot of deletes in a
store, a major compaction could make a big difference but do you think
this the usual case?

Maybe the compaction algorithm should consider age of compactions too?
 If a compaction has been hanging the queue a good while, its priority
gets bumped a level?
//zhou: This is good, I totally agree. This is another good way to solve the 
problem. But by now, I don't know how to make the patch, may be we can dig more 
on it.

St.Ack

RE: Outdated data can not be cleaned in time

Reply via email to