[ 
https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626850#action_12626850
 ] 

stack commented on HBASE-834:
-----------------------------

Patch looks good Billy.  I haven't tested it because after banging my head 
against hbase-826, I've learned that this notion of major compaction is a bit 
more involved than I at first thought (I think you may have known all along how 
important the difference between minor and major is).

Here is what I learned.  While compacting, if we overrun max versions or a cell 
has expired, we do not let the cell go through to the compacted file.  That was 
fine in the old days, when we always compacted everything.  Since we got 
smarter compacting -- i.e. minor compactions only compacting the small files -- 
this behavior can make for malignant results (See towards end of hbase-826 for 
an illustration).

So, Billy, you need to add passing of the 'force' flag down into the 
HStore#compact (We should probably rename 'force' as 'majorCompaction' or 
something?).  Then in HStore#compact, we only do the max versions and 
expiration code IF its a major compaction.  Otherwise, we just let ALL cells go 
through to the compacted files (At runtime, the get and scan respect max 
versions and expiration times).

I'll be on IRC tomorrow if you want to chat more on this Billy or just write 
notes into this JIRA and we can back and forth here (If you want, post a rough 
patch and I can give feedback -- that might be best).

Oh, one other thing, there should be no maximum on the amount of files to 
compact at a time when doing a major compaction, but I think the way your patch 
is written, there isn't; its only when minor compactions run that there is a 
limit -- is that so?

Thanks.



> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the 
> place:
> {code}
> Currently we do compaction on a region when the 
> hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time 
> simulator to doing a minor compaction in bigtable. This keep compaction's 
> form getting tied up in one region to long letting other regions get way to 
> many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will 
> eventuly include all mapfiles causeing a major compaction on that region. 
> Unlike big table this would leave the master out of the process and letting 
> the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the 
> newest mapfiles first leave the larger/older ones for when we have low 
> updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to