[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253526#comment-13253526 ] Hadoop QA commented on HBASE-4071: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12491647/MinVersions-v9.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1513//console This message is automatically generated. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions-v9.diff, MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090449#comment-13090449 ] Ted Yu commented on HBASE-4071: --- Here is the addendum I committed: {code} Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java === --- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java (revision 1161190) +++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java (working copy) @@ -1734,7 +1734,7 @@ public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + (15 * ClassSize.REFERENCE) + (8 * Bytes.SIZEOF_LONG) + (1 * Bytes.SIZEOF_DOUBLE) + - (4 * Bytes.SIZEOF_INT) + (3 * Bytes.SIZEOF_BOOLEAN)); + (5 * Bytes.SIZEOF_INT) + (3 * Bytes.SIZEOF_BOOLEAN)); public static final long DEEP_OVERHEAD = ClassSize.align(FIXED_OVERHEAD + ClassSize.OBJECT + ClassSize.REENTRANT_LOCK + {code} Forgetting to update FIXED_OVERHEAD is very common. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090459#comment-13090459 ] Hbase Build Acct commented on HBASE-4071: - thank you Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090465#comment-13090465 ] Lars Hofhansl commented on HBASE-4071: -- Thanks Ted! Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090475#comment-13090475 ] Lars Hofhansl commented on HBASE-4071: -- Still strange that the test passes locally. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090619#comment-13090619 ] Hudson commented on HBASE-4071: --- Integrated in HBase-TRUNK #2141 (See [https://builds.apache.org/job/HBase-TRUNK/2141/]) HBASE-4071 update FIXED_OVERHEAD for minVersions tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089885#comment-13089885 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1602 --- Ship it! One thing only in below... patch looks good to commit... just one question below (I can fix the other thing myself on commit -- the javadoc). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java https://reviews.apache.org/r/1582/#comment3622 good http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java https://reviews.apache.org/r/1582/#comment3623 javadoc doesn't match param name -- I can fix on commit. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java https://reviews.apache.org/r/1582/#comment3624 I defer to Jon's review here (he knows this stuff...) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java https://reviews.apache.org/r/1582/#comment3625 Why this change? - Michael On 2011-08-23 05:13:38, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-23 05:13:38) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089940#comment-13089940 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- bq. On 2011-08-24 00:03:43, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java, line 127 bq. https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line127 bq. bq. I defer to Jon's review here (he knows this stuff...) This basically just counts versions up instead of down (see next comment below as to why). bq. On 2011-08-24 00:03:43, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java, line 205 bq. https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line205 bq. bq. Why this change? ExplicitColumnTracker used to count versions down and ScanWildcardColumnTracker is counting version up. This change goes to together with the previous change (increment vs decrement). I did that for two reasons: 1. When both column trackers follow similar logic it will be easier to change them and to simplify them further in the future (such as extracting a GC policy). 2. Now the comparison for versions can be: if(count = maxVersions || (count = minVersions isExpired(timestamp)) // Done with versions for this column whereas otherwise it would need to be: if(count == 0) || ((maxVersion-count) minVersions isExpired(timestamp)) // Done with versions for this column Which is harder to read and less intuitive, I think. bq. On 2011-08-24 00:03:43, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java, line 47 bq. https://reviews.apache.org/r/1582/diff/7/?file=34606#file34606line47 bq. bq. javadoc doesn't match param name -- I can fix on commit. Argh... One more that I missed. Thanks Stack! - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1602 --- On 2011-08-23 05:13:38, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-23 05:13:38) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures)
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090009#comment-13090009 ] Lars Hofhansl commented on HBASE-4071: -- Yeah! :) Thanks Stack, and all the other folks who reviewed the code. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089087#comment-13089087 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1591 --- Nice work! Some formatting things (didn't comment on some lines 80 chars) and need to update some of the javadocs for the new arguments. Will look through the patch one more time before I ask questions. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java https://reviews.apache.org/r/1582/#comment3577 line 80 chars http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java https://reviews.apache.org/r/1582/#comment3578 javadoc http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java https://reviews.apache.org/r/1582/#comment3582 seems like this change actually changes what the functionality / lines are between ColumnTracker vs. QueryMatcher. should you update the javadoc here to describe what this is not responsible for? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java https://reviews.apache.org/r/1582/#comment3579 update the javadoc here... maybe explain why we pass ttl? seems odd that this removed timestamp from the API http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java https://reviews.apache.org/r/1582/#comment3581 update javadoc http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java https://reviews.apache.org/r/1582/#comment3580 long line http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/1582/#comment3583 spaces between arguments: (minVersions, maxVersions, ttl) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java https://reviews.apache.org/r/1582/#comment3584 update javadoc http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java https://reviews.apache.org/r/1582/#comment3585 i'm a little confused about this. this doesn't need to look at the number of included versions? you are only ever done if minVersions is explicitly set to 0 (disabled)? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1582/#comment3587 this is a nice comment, but is this true? can't i safely delete anything that is expired once i have hit my min for this file? we may not evict _enough_ but we can at least evict something? unfortunately we don't use the query matcher here, but should be easy to add it here (someone else has added it here as part of some other changes, but they aren't committed yet and won't be for some time) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1582/#comment3588 nice explanation - Jonathan On 2011-08-21 00:32:00, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-21 00:32:00) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 bq.
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089103#comment-13089103 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- bq. On 2011-08-22 22:24:47, Jonathan Gray wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java, line 205 bq. https://reviews.apache.org/r/1582/diff/4/?file=34041#file34041line205 bq. bq. i'm a little confused about this. this doesn't need to look at the number of included versions? you are only ever done if minVersions is explicitly set to 0 (disabled)? Yep... This is just to keep the current optimization where if we do not have minVersions to worry about (the default) we can bail early if the KV is expired. bq. On 2011-08-22 22:24:47, Jonathan Gray wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java, lines 24-41 bq. https://reviews.apache.org/r/1582/diff/4/?file=34037#file34037line24 bq. bq. seems like this change actually changes what the functionality / lines are between ColumnTracker vs. QueryMatcher. should you update the javadoc here to describe what this is not responsible for? Will fix this and all other Javadocs. bq. On 2011-08-22 22:24:47, Jonathan Gray wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 499 bq. https://reviews.apache.org/r/1582/diff/4/?file=34042#file34042line499 bq. bq. this is a nice comment, but is this true? bq. bq. can't i safely delete anything that is expired once i have hit my min for this file? we may not evict _enough_ but we can at least evict something? bq. bq. unfortunately we don't use the query matcher here, but should be easy to add it here (someone else has added it here as part of some other changes, but they aren't committed yet and won't be for some time) You are right this can be optimized better as we discussed. Stack actually had the same comment on the issue. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1591 --- On 2011-08-21 00:32:00, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-21 00:32:00) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159836 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159836 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159836 bq. bq. Diff:
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089145#comment-13089145 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-22 23:24:57.784160) Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. Changes --- Fixed comments, Javadocs, and linebreaks. Summary --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089160#comment-13089160 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1595 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java https://reviews.apache.org/r/1582/#comment3601 these are flipped in order from the signature http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java https://reviews.apache.org/r/1582/#comment3602 @return javadoc doesn't include the variable name (on the @param does) - Jonathan On 2011-08-22 23:24:57, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-22 23:24:57) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089266#comment-13089266 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-23 04:53:14.501780) Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. Changes --- More line length and comment fixes. Summary --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089268#comment-13089268 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-23 04:55:59.908653) Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray. Summary --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089273#comment-13089273 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1596 --- http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java https://reviews.apache.org/r/1582/#comment3603 Javadoc is also needed for this new test. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java https://reviews.apache.org/r/1582/#comment3604 Did you mean 'getClosestRowBefore() can get' - Ted On 2011-08-23 04:55:59, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-23 04:55:59) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089276#comment-13089276 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-23 05:13:38.495607) Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray. Changes --- Added Javadoc to TestMinVersions. Summary --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1160440 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1160440 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087813#comment-13087813 ] Lars Hofhansl commented on HBASE-4071: -- Any objections if I assigned this jira to me? Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088074#comment-13088074 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1578 --- this looks pretty good to me, but I don't know this area of the code that well. Hopefully we can get several eyes on it before commit - the FB guys know this code well, as does Ryan, iirc. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java https://reviews.apache.org/r/1582/#comment3562 nit: just delete these since they're obvious http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1582/#comment3563 maybe worth adding a good comment here explaining this logic - I think it makes sense but I can imagine being confused in a few months :) http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java https://reviews.apache.org/r/1582/#comment3564 add apache license - Todd On 2011-08-18 23:01:12, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-18 23:01:12) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088133#comment-13088133 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-20 05:05:27.147665) Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. Changes --- o Added more comments. o Added more tests. o Added Apache license to test file. o Remove some unused variable and imports from Store.java (while I'm changing it anyway). o Removed some more extraneous whitespace. Summary --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159836 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159836 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087156#comment-13087156 ] Lars Hofhansl commented on HBASE-4071: -- Again sorry for he review churn. Updates were posted even after I removed the jira from the review :( Quick update... Since there are so many actors involved in this (Store, Region, ScanQueryMatcher, ColumnTrackers), all with slightly different intricate logic, I think abstracting this out into an interface will either not make it nicer, or require me to rewrite the entire logic. Instead I unified both the TTL and Versioning logic inside the ColumnTrackers, while still giving the trackers a chance to bail out early. That made it simpler, and will hopefully make it easier in the future to abstract this further (I think that needs to be coordinated with the Compaction Coprocessor work). One problem I encountered is Store.getRowKeyAtOrBefore. That currently honors TTL but not MaxVersions (which is strange). I'm thinking I'll either leave that alone, or have it also not honor TTL when the store has minversions. Fixing this correctly in all cases would mean to scan all relevant KVs in the Memstore (i.e. ignoring TTL and version restrictions), then use those candidates to scan the storefiles (now honoring TTL and doing the version counting). Added a new test that validates the basic behavior. Running tests now. (I seems to have a hard time to get a full test run through locally - with or without my patch). Will attach a new patch soon that should still be considered a sketch but should hold up to a bit more scrutiny. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087375#comment-13087375 ] Lars Hofhansl commented on HBASE-4071: -- Please have a look at the review now (the same one), and let me know what you think. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087381#comment-13087381 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 23:01:12.689967) Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. Summary (updated) --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087379#comment-13087379 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 22:58:25.647922) Review request for hbase and Ian Varley. Summary (updated) --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 Diff: https://reviews.apache.org/r/1582/diff Testing (updated) --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087380#comment-13087380 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 22:59:37.165069) Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. Summary --- Allow enforcing a minimum number of versions when TTL is enable for a store. The GC logic for both versions and TTL is unified inside the ColumnTrackers. This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 Diff: https://reviews.apache.org/r/1582/diff Testing --- Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. New tests: TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087384#comment-13087384 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1536 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java https://reviews.apache.org/r/1582/#comment3495 Will fix the white space http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1582/#comment3496 This means that expired rows are not removed on flush when minVersions is set. That is because at this point we do not have enough information. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1582/#comment3498 Note the interface change here. KeyValue, could have a timestamp, in which case we'd look for that particular version. No caller used this, and passing byte[] avoids that problem completely. - Lars On 2011-08-18 23:01:12, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-18 23:01:12) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087388#comment-13087388 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1537 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1582/#comment3503 HRegion.getClosestRowBefore() currently calls this method. - Ted On 2011-08-18 23:01:12, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-18 23:01:12) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087400#comment-13087400 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- bq. On 2011-08-18 23:15:00, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 1277 bq. https://reviews.apache.org/r/1582/diff/2/?file=33571#file33571line1277 bq. bq. HRegion.getClosestRowBefore() currently calls this method. Sorry what I meant to say is that no caller makes use of passing a backdated KeyValue to this method. Changing the signature here makes sure that nobody will. The caller in HRegion is also changed as part of this patch. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1537 --- On 2011-08-18 23:01:12, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-18 23:01:12) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087454#comment-13087454 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/#review1542 --- http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java https://reviews.apache.org/r/1582/#comment3513 Still using HBaseTestCase here, because it gave me all the lowlevel stuff that I needed. If folks feel strongly I'll add what I need to HBaseTestingUtility and write jUnit4 tests instead. - Lars On 2011-08-18 23:01:12, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1582/ bq. --- bq. bq. (Updated 2011-08-18 23:01:12) bq. bq. bq. Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley. bq. bq. bq. Summary bq. --- bq. bq. Allow enforcing a minimum number of versions when TTL is enable for a store. bq. The GC logic for both versions and TTL is unified inside the ColumnTrackers. bq. bq. bq. This addresses bug HBASE-4071. bq. https://issues.apache.org/jira/browse/HBASE-4071 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1159317 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1159317 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1159317 bq. bq. Diff: https://reviews.apache.org/r/1582/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool. Both fail with or without my changes. bq. New tests: TestMinVersions. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086799#comment-13086799 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- Review request for Ian Varley. Summary --- A min versions coupled with TTL. Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside. What is still open what to do with Store.getKeyAtOrBefore(...) This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 Diff: https://reviews.apache.org/r/1582/diff Testing --- See TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086804#comment-13086804 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 05:01:59.054304) Review request for Ian Varley. Summary --- A min versions coupled with TTL. Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside. What is still open what to do with Store.getKeyAtOrBefore(...) Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 Diff: https://reviews.apache.org/r/1582/diff Testing --- See TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086808#comment-13086808 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 05:07:35.742598) Review request for Ian Varley. Summary --- A min versions coupled with TTL. Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside. What is still open what to do with Store.getKeyAtOrBefore(...) Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 Diff: https://reviews.apache.org/r/1582/diff Testing --- See TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085853#comment-13085853 ] Lars Hofhansl commented on HBASE-4071: -- Thanks Stack. re if (currentCount = minVersions): in this case the count was already incremented by the maxversions check (which is just outside of the context provided by the diff). re expunging expired rows on flush: Currently it seems this is done as an optimization. The reasons why believe this cannot be done if minversions0 are: (1) The cache might not have all version, so I cannot count the versions to determine the cut-off point. (2) Even if we have minversions=1, there is no guarantee that the versions in the cache include the latest one (puts could have been backdated). In both cases I think only at compaction time do we have enough information to remove expired cells (if minversions is 0). re comments: Of course. That's why there is version control :) I just left these in as comments to have a place where I can put a comment as to why I believe these are not necessary. So you think test coverage of the existing functionality is sufficient? That is very good to know. I'll add tests for the new functionality. What's the general feeling? Should I aim for minimal intrusion or attempt to do a bit refactoring to abstract these policies into an interface? Leaning towards the latter, but on the other hand the change would be more risky. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085889#comment-13085889 ] stack commented on HBASE-4071: -- bq. In both cases I think only at compaction time do we have enough information to remove expired cells (if minversions is 0). Fair enough. bq. So you think test coverage of the existing functionality is sufficient? That is very good to know. IIRC, there are some decent tests for this stuff. Also, the operation is so fundamental that if broke, it'd bubble up as a broken test somewhere (the connection back down into this code might be an indirection on top of an indirection, but a test will break I'd say). bq. What's the general feeling? Should I aim for minimal intrusion or attempt to do a bit refactoring to abstract these policies into an interface? Leaning towards the latter, but on the other hand the change would be more risky. I'm for the latter. We're trying to push out next major revision of hbase. It'll get some scrutiny and testing so if you've broken something it should show (or if not, our coverage needs improving). Good stuff LarsH. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085175#comment-13085175 ] Todd Lipcon commented on HBASE-4071: It seems OK to hardcode this, but if it's easy to refactor this even to an internal (non-coprocessor non-loadable) interface, that would be cool. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085518#comment-13085518 ] Lars Hofhansl commented on HBASE-4071: -- We could have a CompactionPolicy (or something similarly named) interface implemented in each Store. I need to spend some time looking at all the various cases to see how to abstract this. In the test-patch above, did I miss any part that would need to be touched by this? Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085541#comment-13085541 ] stack commented on HBASE-4071: -- @LarsH Patch looks great. Minor intrusion. I'd say that best way to build confidence that all works as it did is to run all unit tests -- IIRC we should have coverage enough to notice if you've broken the default behavior -- and then for your additions, work up a few unit tests that exercise the new functionality (This will also uncover any other code your patch should have touched -- I don't think you've missed anything since same Scan code is used compacting this you might want to verify, that compacting we do right thing, especially major vs minor around this new minimum versions config). Is this right? + if (currentCount = minVersions) The other two instances preincrement currentCount. This instance doesn't. Is there a reason (If so, probably deserves a comment). Looks too like a bit of repeated code here that perhaps could be factored out to a method of its own? This is sketch code but FYI, around these parts folks remove code rather than comment it out... On 1. above, 'Expired rows can no longer be expunged...', why not? Are we doing this now? (I don't remember) If so, why can't we do this still? On 2., it don't look too bad to me On 7., yes. Good on you LarsH. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084792#comment-13084792 ] Andrew Purtell commented on HBASE-4071: --- TTLs were introduced to conveniently expire transient data and garbage collect it. Setting a TTL on a column family defines that column family as transient. Whatever changes are contemplated to be hard coded, this use case must be maintained. I want to be able to set a TTL and versions to 1 or 3 or 10 or whatever and have ALL of the versions discarded when their timestamps indicate expiry. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084800#comment-13084800 ] Lars Hofhansl commented on HBASE-4071: -- Of course you're right Andrew. We shouldn't mix usecases together. What do you think of the minversions idea? As long as minversions for a CF is 0 the behavior would be identical to the current behavior. Also maybe what we have in mind is not a common scenario after all, and we shouldn't complicate the core codebase with that. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084872#comment-13084872 ] Andrew Purtell commented on HBASE-4071: --- bq. What do you think of the minversions idea? I think a new minversions in core is a fine idea. We can document how minversions, maxversions, and TTL settings interact with each other simply enough. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084778#comment-13084778 ] Ian Varley commented on HBASE-4071: --- I like the idea of making this pluggable (via the coprocessor framework, or otherwise). But I also think this is a fundamental enough policy option that making it hard-coded might be a good idea. When I was talking to someone the other day about the current TTL policy, he was like, WTF, who would want that, it eats your data?. There's no such thing as a keep 0 versions option, and thus no way to accidentally lose your most current data using that approach. But with the TTL version there is, which is (IMO) counter-intuitive for those coming from an RDBMS background. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084782#comment-13084782 ] Lars Hofhansl commented on HBASE-4071: -- I also think a simple notion like keep everything more recent than T, but at least N versions seems to be a frequent enough option to be hard-coded. Maybe that could be indicated simply by setting both # of versions AND TTL on a column family. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061102#comment-13061102 ] Jonathan Gray commented on HBASE-4071: -- I like this idea. It's somewhat related to an idea for a TTKAV (TimeToKeepAllValues) parameter that would allow a point-in-time SnapshotScanner. See HBASE-2376 Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061107#comment-13061107 ] Todd Lipcon commented on HBASE-4071: Rather than hard-code these various policies, it would be very nice if the compaction dropping policy were pluggable. I've often had people request the ability to do something like check the value of a column, and change TTL based on that value. Or, given that we have an entire row available when doing a major compaction, it would be possible to implement a feature like putting an expiration time as the value of one column, and having it expire the entire row's worth at that timestamp. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061402#comment-13061402 ] Todd Lipcon commented on HBASE-4071: Sure, could be another type of coprocessor. Certainly would be nice to dynamically load them and share the other CP infrastructure. Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira