[ https://issues.apache.org/jira/browse/HBASE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573072#comment-13573072 ]
Lars Hofhansl commented on HBASE-7763: -------------------------------------- So just to state the obvious: The selections depends on what metric we're trying to optimize. We can (1) optimize for write amplification (i.e. minimizing it) or optimize for (2) read performance (reducing the number of scanner participating in the merge scan). For case #1 we'd pick larger files first, and for #2 we'd pick smaller files first. Making this configurable or pluggable thus makes a lot of sense. They actual behavior in a production setting is very hard to predict, and as I said above: The folks from Facebook did a lot of research and measuring to come up with the current compaction selection algorithm. I'm surprised they are quiet on this. > Compactions not sorting based on size anymore. > ---------------------------------------------- > > Key: HBASE-7763 > URL: https://issues.apache.org/jira/browse/HBASE-7763 > Project: HBase > Issue Type: Bug > Components: Compaction > Affects Versions: 0.96.0, 0.94.4 > Reporter: Elliott Clark > Assignee: Elliott Clark > Priority: Critical > Fix For: 0.96.0, 0.94.6 > > Attachments: HBASE-7763-trunk-TESTING.patch, > HBASE-7763-trunk-TESTING.patch, HBASE-7763-trunk-TESTING.patch > > > Currently compaction selection is not sorting based on size. This causes > selection to choose larger files to re-write than are needed when bulk loads > are involved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira