[jira] [Commented] (HBASE-3149) Make flush decisions per column family

Lars George (Commented) (JIRA) Tue, 21 Feb 2012 12:51:15 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212938#comment-13212938
 ]


Lars George commented on HBASE-3149:
------------------------------------

bq. At the same time, I'd think this issue still worth some time; if lots of 
cfs and only one is filling, its silly to flush the others as we do now because 
one is over the threshold.

I thought so too. Setting the hbase.hstore.compaction.size to 4MB, and having 
the flush size at 256MB, it means you will never compact flush files larger 
than 4MB. So, in other words, only if you are flushing small files (say from a 
small, dependent column family) you are running a minor compaction on them. For 
the larger family you typically do not run those at all, right?

This surely seems a specific setting for this use-case, and there are others 
that need a slightly different setting. If you mix those two on the same 
cluster, then having only one global setting to adjust this seems restrictive? 
Should this be a setting per table, like the flush size?

It still seems to me that decoupling is what we should have available as well. 
But I thought about it for a while as well as discussed this various people: it 
seems that decoupling brings its own set of issues, for example, you might end 
up with too many HLog files because the small family is flushed only rarely. 
                
> Make flush decisions per column family
> --------------------------------------
>
>                 Key: HBASE-3149
>                 URL: https://issues.apache.org/jira/browse/HBASE-3149
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>             Fix For: 0.92.1
>
>
> Today, the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3149) Make flush decisions per column family

Reply via email to