[ 
https://issues.apache.org/jira/browse/HBASE-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982895#action_12982895
 ] 

Nicolas Spiegelberg commented on HBASE-3450:
--------------------------------------------

Some interesting stats.  We did some rough calculations internally to see what 
effect an uneven distribution of data into column families was having on our 
network IO.  Our data distribution for 3 column families was 1:1:20.  When we 
looked at the flush:minor-compaction ratio for each of the store files, the 
large column family had a 1:2 ratio but the small CFs both had a 1:20 ratio!  
We are looking at roughly a 10% network IO decrease if we can bring those other 
2 CFs down to a 1:2 ratio as well. 

> Per-CF Flushes
> --------------
>
>                 Key: HBASE-3450
>                 URL: https://issues.apache.org/jira/browse/HBASE-3450
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.90.1, 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>
> In cases where the load to all column families in a store is not evenly 
> distributed, having per-column family flushes will reduce network IO by 
> helping the compaction algorithm minimize its need for unconditional 
> selection.  This issue is about refactoring the flush algorithm to move from 
> HRegion granularity to Store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to