[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201-addendum_1.patch A typo found when debugging HBASE-12405. A simple put->putIfAbsent change so add it here as an addendum patch. Need to commit to master and branch-1, Thanks~ > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 2.0.0, 1.1.0 > > Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, > HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, > HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, > HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, > HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, > HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, > HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, > memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Fix Version/s: 1.1.0 > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 2.0.0, 1.1.0 > > Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, > HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, > HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, > HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, > HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, > HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, > HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10201: --- Attachment: 10201-addendum.txt >From >https://builds.apache.org/job/HBase-TRUNK/5931/testReport/org.apache.hadoop.hbase.regionserver/TestPerColumnFamilyFlush/testLogReplayWithDistributedReplay/ > : {code} java.lang.IllegalStateException: A mini-cluster is already running at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:865) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:799) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:770) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:757) at org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush.testLogReplay(TestPerColumnFamilyFlush.java:349) at org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush.testLogReplayWithDistributedReplay(TestPerColumnFamilyFlush.java:430) {code} Attached addendum changes TestPerColumnFamilyFlush to large test such that there would be no collision as shown above in Jenkins build. TestPerColumnFamilyFlush passed with the addendum. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 2.0.0 > > Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, > HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, > HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, > HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, > HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, > HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, > HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Resolution: Fixed Fix Version/s: (was: 1.0.0) Release Note: Adds new flushing policy mechanism. Default, org.apache.hadoop.hbase.regionserver.FlushLargeStoresPolicy, will try to avoid flushing out the small column families in a region, those whose memstores are < hbase.hregion.percolumnfamilyflush.size.lower.bound. To restore the old behavior of flushes writing out all column families, set hbase.regionserver.flush.policy to org.apache.hadoop.hbase.regionserver.FlushAllStoresPolicy either in hbase-default.xml or on a per-table basis by setting the policy to use with HTableDescriptor.getFlushPolicyClassName(). Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to master branch > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, > compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_19.patch > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, > compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: (was: HBASE-10201_19.patch) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, > memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_19.patch Addressed the issue on rb > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, > compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10201: --- Status: Patch Available (was: Open) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, > memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_18.patch Rename lastFlushSeqId to maxFlushedSeqId in HRegion. Always generate a flushSeqId with incrementing of sequenceId and now maxFlushedSeqId is not equal to flushSeqId. Rename FlushLargeStorePolicy to FlushLargeAndOldStorePolicy that also flush store which make HRegion.shouldFlush, and change forceFlushAllStores flag to false in PeriodicMemstoreFlusher. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, > memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Priority: Major (was: Critical) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Fix Version/s: (was: 0.98.10) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Priority: Critical (was: Major) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10201: --- Priority: Major (was: Critical) I also don't think this is a Critical priority issue, since the behavior discussed is long standing and what the changes look like are not settled. Changing priority, but feel free to change back if you disagree. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo > Fix For: 1.0.0, 2.0.0, 0.98.10 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10201: --- Fix Version/s: (was: 0.98.9) 0.98.10 Status: Open (was: Patch Available) Canceling stale patch. Moving out of 0.98.9. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.10 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_17.patch line length > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_16.patch trivial change to remove javadoc warnings > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, > compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_15.patch fix a javadoc issue. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, > memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_14.patch Seems previous hadoop QA run was exited abnormally. Rebase and retry. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: (was: HBASE-10201_14.patch) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, > compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_14.patch Ted Yu's suggestion on rb. Fix typo, add FlushPolicyFactory. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Attachment: compactions.png io.png count.png memstore.png Ran some loadings. Small cluster with one regionserver hosting one region. Used the test packaged in this patch modifying it so could run ten clients in parallel rather than a single client. The included test has a table schema of three column families and it fills them unevenly so it is 'ideal' for demonstrating benefit. I ran with patch turned off twice and then turned on twice. Set flushes at 64M. I see less compactions and less hfiles (so less i/o), memstores carrying more (its hard to see but you should be able to make out memstore sizes do not go to zero or near zero when the patch is enabled) Looks good. Let me review again to recheck sequenceid accounting and run some MTTR tests. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, > compactions.png, count.png, io.png, memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Attachment: HBASE-10201_13.patch Retry. Yes, hadoopqa was broke today. I think it back now. Lets see by retrying. Still working on getting a few numbers to show with and without patch. Sorry it taking so long. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_13.patch Use a FlushPolicy instead of hbase.hregion.memstore.percolumnfamilyflush.enabled config. I'm not good at naming things, and may break some rules when add the policy. So just point out if you have a better name of the policy and anything that you think is wrong in this patch. I will fix it as soon as possible. Thanks. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-10201: -- Fix Version/s: (was: 0.99.2) 1.0.0 > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 1.0.0, 2.0.0, 0.98.9 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_12.patch [~tedyu]'s comments on rb > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_11.patch rebase and some simple change > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, > HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, > HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_10.patch [~busbey]'s comment on rb > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_9.patch rebase for WAL refactoring > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, > HBASE-10201_9.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_8.patch The pressure on log truncating is solved by LogRoller and we already have a testcase in TestPerColumnFamilyFlush, sorry... And I see that shouldFlush method in HRegion only check flushSeqId and lastFlushTime, and return true means we have some data remain in memstore for a long time, so I change PeriodicMemstoreFlusher to always flush all stores instead of doing a selective flush. Add comment to testCompareStoreFileCount. ReviewBoard: https://reviews.apache.org/r/28151/diff/# > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_7.patch Seems HBASE-12405 must be evaluated by HBASE-10201, so I combined the changes together and post it in here. HBASE-12405 can be marked as duplicated with HBASE-10201 maybe? Some comments: I do not change the way of storeing flushedSequenceIdByRegion in ServerManager. We need to change protobuf definition(change CompleteSequenceId in RegionLoad from long to a kvlist) if we want to do this, so it may break wire compatible, at least a rolling upgrade or downgrade will be difficult. And also, I do not change the RecoveringRegionLastFlushedSequenceId stored on zk when doing distributed log replay. The reason is same. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch, HBASE-10201_7.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_6.patch plan to start working on HBASE-12405, so rebase this patch to the latest master code. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, > HBASE-10201_6.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10201: --- Fix Version/s: 0.98.9 > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_5.patch rebase since master's HEAD has been moved. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_4.patch > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch, HBASE-10201_4.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201-0.99.patch Backport to branch-1. [~enis] I tried but failed to make dev-support/test-patch.sh work properly... So I only run the unit tests, sorry... > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, > HBASE-10201_3.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_3.patch Wrap lines. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_2.patch fix some typo rename some methods and fields use AtomicUtils.updateMin change DEFAULT_HREGION_MEMSTORE_PER_COLUMN_FAMILY_FLUSH to true to make the option open by default. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch, HBASE-10201_2.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Assignee: zhangduo > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Assignee: zhangduo >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Priority: Critical (was: Major) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Component/s: wal > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10201: -- Fix Version/s: 0.99.2 2.0.0 > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Ted Yu >Priority: Critical > Fix For: 2.0.0, 0.99.2 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201_1.patch Modified according to [~anoop.hbase] 's suggestion. Add a lowestPossibleSeqId check to avoid waiting for sequence id to set every time. Move oldestSeqId to HRegion, do not modify Store and MemStore. And only record oldestSeqId when perColumnFamilyFlushEnabled is true. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, > HBASE-10201_1.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10201: --- Status: Patch Available (was: Open) > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201.patch Sorry I was wrong, getSequenceId in HLogKey will block until logSeqNum is set, so I can get the sequence id before syncing WAL. But I do not want to change the order when mutate, so I add a method to record oldestSequenceId, which is called after WAL appendNoSync and before releasing updateLock. The patch passed all unit tests on my machine. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201-0.98_2.patch Running with TestPerColumnFamilyFlush. 3 CFs, 16B value for CF1, 256B value for CF2 and 4K value for CF3, 1M rows, 128M memstore flush size, 16M CF flush size. Result without per CF flush: NumStoreFiles: 7, StoreFileSize: 4336644762, NumCompactionsCompleted: 46, NumFilesCompacted: 146, NumBytesCompacted: 11132103132 Write amplification: 2.57 Result with per CF flush: NumStoreFiles: 10, StoreFileSize: 4482510274, NumCompactionsCompleted: 27, NumFilesCompacted: 89, NumBytesCompacted: 10353603767 Write amplification: 2.31 Next I will run this benchmark on a real cluster instead of minicluster. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201-0.98_1.patch According to Ted Yu's suggestion, add a testcase called testCompareStoreFileCount in TestPerColumnFamilyFlush to confirm we really reduce the number of store files with this patch. > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-10201: - Attachment: HBASE-10201-0.98.patch I port the 3149-trunk-v1.txt patch to branch 0.98(a "just make it work" version, not the final version). Port to master is more difficult because of the rewrite of HLog. Flush per CF means we need to record the oldest sequence id per store instead of per region, so the patch add a seqNum parameter when add kv to store, which means we need to know the seqNum before we add kv to store. It is easy on branch 0.98, just need to change the order of appendNoSync of wal and write back to memstore(am I right?). But on master, HLog seems to use a event-driven framework, and I am not sure when will the seqNum be determined. The second problem is the flushSeqId. on 0.98, it is just a simple incAndGet, but on master it uses a method in HLog. So on 0.98, if we only flush some of the stores, we can set the flushSeqId to the oldest seqNum stored in the stores that not being flushed and do not inc sequenceId. But on master, I do not know the side effect of the method.Is it ok to remove the method call, or we still need to log something? > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10201: --- Attachment: 3149-trunk-v1.txt Work in progress > Port 'Make flush decisions per column family' to trunk > -- > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > Attachments: 3149-trunk-v1.txt > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.1.4#6159)