[ 
https://issues.apache.org/jira/browse/HBASE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573072#comment-13573072
 ] 

Lars Hofhansl commented on HBASE-7763:
--------------------------------------

So just to state the obvious: The selections depends on what metric we're 
trying to optimize.
We can (1) optimize for write amplification (i.e. minimizing it) or optimize 
for (2) read performance (reducing the number of scanner participating in the 
merge scan).

For case #1 we'd pick larger files first, and for #2 we'd pick smaller files 
first.
Making this configurable or pluggable thus makes a lot of sense.

They actual behavior in a production setting is very hard to predict, and as I 
said above: The folks from Facebook did a lot of research and measuring to come 
up with the current compaction selection algorithm. I'm surprised they are 
quiet on this.

                
> Compactions not sorting based on size anymore.
> ----------------------------------------------
>
>                 Key: HBASE-7763
>                 URL: https://issues.apache.org/jira/browse/HBASE-7763
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.6
>
>         Attachments: HBASE-7763-trunk-TESTING.patch, 
> HBASE-7763-trunk-TESTING.patch, HBASE-7763-trunk-TESTING.patch
>
>
> Currently compaction selection is not sorting based on size.  This causes 
> selection to choose larger files to re-write than are needed when bulk loads 
> are involved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to