[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2012-04-13 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253526#comment-13253526
 ] 

Hadoop QA commented on HBASE-4071:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12491647/MinVersions-v9.diff
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1513//console

This message is automatically generated.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions-v9.diff, MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090449#comment-13090449
 ] 

Ted Yu commented on HBASE-4071:
---

Here is the addendum I committed:
{code}
Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===
--- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java   
(revision 1161190)
+++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java   
(working copy)
@@ -1734,7 +1734,7 @@
   public static final long FIXED_OVERHEAD = ClassSize.align(
   ClassSize.OBJECT + (15 * ClassSize.REFERENCE) +
   (8 * Bytes.SIZEOF_LONG) + (1 * Bytes.SIZEOF_DOUBLE) +
-  (4 * Bytes.SIZEOF_INT) + (3 * Bytes.SIZEOF_BOOLEAN));
+  (5 * Bytes.SIZEOF_INT) + (3 * Bytes.SIZEOF_BOOLEAN));
 
   public static final long DEEP_OVERHEAD = ClassSize.align(FIXED_OVERHEAD +
   ClassSize.OBJECT + ClassSize.REENTRANT_LOCK +
{code}
Forgetting to update FIXED_OVERHEAD is very common.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-24 Thread Hbase Build Acct (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090459#comment-13090459
 ] 

Hbase Build Acct commented on HBASE-4071:
-

thank you



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090465#comment-13090465
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Thanks Ted!

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090475#comment-13090475
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Still strange that the test passes locally.





 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090619#comment-13090619
 ] 

Hudson commented on HBASE-4071:
---

Integrated in HBase-TRUNK #2141 (See 
[https://builds.apache.org/job/HBase-TRUNK/2141/])
HBASE-4071 update FIXED_OVERHEAD for minVersions

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089885#comment-13089885
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1602
---

Ship it!


One thing only in below... patch looks good to commit... just one question 
below (I can fix the other thing myself on commit -- the javadoc).


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
https://reviews.apache.org/r/1582/#comment3622

good



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
https://reviews.apache.org/r/1582/#comment3623

javadoc doesn't match param name -- I can fix on commit.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
https://reviews.apache.org/r/1582/#comment3624

I defer to Jon's review here (he knows this stuff...)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
https://reviews.apache.org/r/1582/#comment3625

Why this change?


- Michael


On 2011-08-23 05:13:38, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-23 05:13:38)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all 

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089940#comment-13089940
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--



bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java,
 line 127
bq.   https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line127
bq.  
bq.   I defer to Jon's review here (he knows this stuff...)

This basically just counts versions up instead of down (see next comment below 
as to why).


bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java,
 line 205
bq.   https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line205
bq.  
bq.   Why this change?

ExplicitColumnTracker used to count versions down and ScanWildcardColumnTracker 
is counting version up.
This change goes to together with the previous change (increment vs decrement).

I did that for two reasons:

1. When both column trackers follow similar logic it will be easier to change 
them and to simplify them further in the future (such as extracting a GC 
policy).

2. Now the comparison for versions can be:
 if(count = maxVersions || (count = minVersions  isExpired(timestamp))
// Done with versions for this column

whereas otherwise it would need to be:
 if(count == 0) || ((maxVersion-count)  minVersions  isExpired(timestamp))
// Done with versions for this column

Which is harder to read and less intuitive, I think.


bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java,
 line 47
bq.   https://reviews.apache.org/r/1582/diff/7/?file=34606#file34606line47
bq.  
bq.   javadoc doesn't match param name -- I can fix on commit.

Argh... One more that I missed. Thanks Stack!


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1602
---


On 2011-08-23 05:13:38, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-23 05:13:38)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) 

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090009#comment-13090009
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Yeah! :)
Thanks Stack, and all the other folks who reviewed the code.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089087#comment-13089087
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1591
---


Nice work!  Some formatting things (didn't comment on some lines  80 chars) 
and need to update some of the javadocs for the new arguments.  Will look 
through the patch one more time before I ask questions.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
https://reviews.apache.org/r/1582/#comment3577

line  80 chars



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
https://reviews.apache.org/r/1582/#comment3578

javadoc



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
https://reviews.apache.org/r/1582/#comment3582

seems like this change actually changes what the functionality / lines are 
between ColumnTracker vs. QueryMatcher.  should you update the javadoc here to 
describe what this is not responsible for?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
https://reviews.apache.org/r/1582/#comment3579

update the javadoc here... maybe explain why we pass ttl?  seems odd that 
this removed timestamp from the API



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
https://reviews.apache.org/r/1582/#comment3581

update javadoc



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
https://reviews.apache.org/r/1582/#comment3580

long line



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/1582/#comment3583

spaces between arguments:  (minVersions, maxVersions, ttl)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
https://reviews.apache.org/r/1582/#comment3584

update javadoc



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
https://reviews.apache.org/r/1582/#comment3585

i'm a little confused about this.  this doesn't need to look at the number 
of included versions?  you are only ever done if minVersions is explicitly set 
to 0 (disabled)?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1582/#comment3587

this is a nice comment, but is this true?

can't i safely delete anything that is expired once i have hit my min for 
this file?  we may not evict _enough_ but we can at least evict something?

unfortunately we don't use the query matcher here, but should be easy to 
add it here (someone else has added it here as part of some other changes, but 
they aren't committed yet and won't be for some time)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1582/#comment3588

nice explanation


- Jonathan


On 2011-08-21 00:32:00, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-21 00:32:00)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159836 
bq.

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089103#comment-13089103
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--



bq.  On 2011-08-22 22:24:47, Jonathan Gray wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java,
 line 205
bq.   https://reviews.apache.org/r/1582/diff/4/?file=34041#file34041line205
bq.  
bq.   i'm a little confused about this.  this doesn't need to look at the 
number of included versions?  you are only ever done if minVersions is 
explicitly set to 0 (disabled)?

Yep... This is just to keep the current optimization where if we do not have 
minVersions to worry about (the default) we can bail early if the KV is expired.


bq.  On 2011-08-22 22:24:47, Jonathan Gray wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java,
 lines 24-41
bq.   https://reviews.apache.org/r/1582/diff/4/?file=34037#file34037line24
bq.  
bq.   seems like this change actually changes what the functionality / 
lines are between ColumnTracker vs. QueryMatcher.  should you update the 
javadoc here to describe what this is not responsible for?

Will fix this and all other Javadocs.


bq.  On 2011-08-22 22:24:47, Jonathan Gray wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 499
bq.   https://reviews.apache.org/r/1582/diff/4/?file=34042#file34042line499
bq.  
bq.   this is a nice comment, but is this true?
bq.   
bq.   can't i safely delete anything that is expired once i have hit my 
min for this file?  we may not evict _enough_ but we can at least evict 
something?
bq.   
bq.   unfortunately we don't use the query matcher here, but should be 
easy to add it here (someone else has added it here as part of some other 
changes, but they aren't committed yet and won't be for some time)

You are right this can be optimized better as we discussed. Stack actually had 
the same comment on the issue.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1591
---


On 2011-08-21 00:32:00, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-21 00:32:00)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159836 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159836 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159836 
bq.  
bq.  Diff: 

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089145#comment-13089145
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-22 23:24:57.784160)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
---

Fixed comments, Javadocs, and linebreaks.


Summary
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089160#comment-13089160
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1595
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
https://reviews.apache.org/r/1582/#comment3601

these are flipped in order from the signature



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
https://reviews.apache.org/r/1582/#comment3602

@return javadoc doesn't include the variable name (on the @param does)


- Jonathan


On 2011-08-22 23:24:57, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-22 23:24:57)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089266#comment-13089266
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-23 04:53:14.501780)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
---

More line length and comment fixes.


Summary
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089268#comment-13089268
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-23 04:55:59.908653)


Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.


Summary
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089273#comment-13089273
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1596
---



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
https://reviews.apache.org/r/1582/#comment3603

Javadoc is also needed for this new test.



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
https://reviews.apache.org/r/1582/#comment3604

Did you mean 'getClosestRowBefore() can get'


- Ted


On 2011-08-23 04:55:59, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-23 04:55:59)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089276#comment-13089276
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-23 05:13:38.495607)


Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.


Changes
---

Added Javadoc to TestMinVersions.


Summary
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1160440 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1160440 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1160440 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087813#comment-13087813
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Any objections if I assigned this jira to me?

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-19 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088074#comment-13088074
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1578
---


this looks pretty good to me, but I don't know this area of the code that well. 
Hopefully we can get several eyes on it before commit - the FB guys know this 
code well, as does Ryan, iirc.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
https://reviews.apache.org/r/1582/#comment3562

nit: just delete these since they're obvious



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1582/#comment3563

maybe worth adding a good comment here explaining this logic - I think it 
makes sense but I can imagine being confused in a few months :)



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
https://reviews.apache.org/r/1582/#comment3564

add apache license


- Todd


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when 

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-19 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088133#comment-13088133
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-20 05:05:27.147665)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Changes
---

o Added more comments.
o Added more tests.
o Added Apache license to test file.
o Remove some unused variable and imports from Store.java (while I'm changing 
it anyway).
o Removed some more extraneous whitespace.


Summary
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159836 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159836 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159836 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087156#comment-13087156
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Again sorry for he review churn. Updates were posted even after I removed the 
jira from the review :(

Quick update...
Since there are so many actors involved in this (Store, Region, 
ScanQueryMatcher, ColumnTrackers), all with slightly different intricate logic, 
I think abstracting this out into an interface will either not make it nicer, 
or require me to rewrite the entire logic.
Instead I unified both the TTL and Versioning logic inside the ColumnTrackers, 
while still giving the trackers a chance to bail out early. That made it 
simpler, and will hopefully make it easier in the future to abstract this 
further (I think that needs to be coordinated with the Compaction Coprocessor 
work).

One problem I encountered is Store.getRowKeyAtOrBefore. That currently honors 
TTL but not MaxVersions (which is strange). I'm thinking I'll either leave that 
alone, or have it also not honor TTL when the store has minversions.

Fixing this correctly in all cases would mean to scan all relevant KVs in the 
Memstore (i.e. ignoring TTL and version restrictions), then use those 
candidates to scan the storefiles (now honoring TTL and doing the version 
counting).

Added a new test that validates the basic behavior.
Running tests now. (I seems to have a hard time to get a full test run through 
locally - with or without my patch).

Will attach a new patch soon that should still be considered a sketch but 
should hold up to a bit more scrutiny.


 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087375#comment-13087375
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Please have a look at the review now (the same one), and let me know what you 
think.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087381#comment-13087381
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-18 23:01:12.689967)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Summary (updated)
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087379#comment-13087379
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-18 22:58:25.647922)


Review request for hbase and Ian Varley.


Summary (updated)
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 

Diff: https://reviews.apache.org/r/1582/diff


Testing (updated)
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087380#comment-13087380
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-18 22:59:37.165069)


Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.


Summary
---

Allow enforcing a minimum number of versions when TTL is enable for a store.
The GC logic for both versions and TTL is unified inside the ColumnTrackers.


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting 
and TestHTablePool. Both fail with or without my changes.
New tests: TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087384#comment-13087384
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1536
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
https://reviews.apache.org/r/1582/#comment3495

Will fix the white space



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1582/#comment3496

This means that expired rows are not removed on flush when minVersions is 
set.
That is because at this point we do not have enough information.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1582/#comment3498

Note the interface change here.
KeyValue, could have a timestamp, in which case we'd look for that 
particular version.
No caller used this, and passing byte[] avoids that problem completely.


- Lars


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if 

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087388#comment-13087388
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1537
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1582/#comment3503

HRegion.getClosestRowBefore() currently calls this method.


- Ted


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087400#comment-13087400
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--



bq.  On 2011-08-18 23:15:00, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 1277
bq.   https://reviews.apache.org/r/1582/diff/2/?file=33571#file33571line1277
bq.  
bq.   HRegion.getClosestRowBefore() currently calls this method.

Sorry what I meant to say is that no caller makes use of passing a backdated 
KeyValue to this method. Changing the signature here makes sure that nobody 
will. The caller in HRegion is also changed as part of this patch.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1537
---


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: 

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087454#comment-13087454
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1542
---



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
https://reviews.apache.org/r/1582/#comment3513

Still using HBaseTestCase here, because it gave me all the lowlevel stuff 
that I needed.
If folks feel strongly I'll add what I need to HBaseTestingUtility and 
write jUnit4 tests instead.


- Lars


On 2011-08-18 23:01:12, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  ---
bq.  
bq.  (Updated 2011-08-18 23:01:12)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Ian Varley.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a 
store.
bq.  The GC logic for both versions and TTL is unified inside the 
ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.  https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1159317 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1159317 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1159317 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. I get error (not failures) from two: 
TestDistributedLogSplitting and TestHTablePool. Both fail with or without my 
changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-17 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086799#comment-13086799
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

Review request for Ian Varley.


Summary
---

A min versions coupled with TTL.

Note that the I unified the GC logic inside the ColumnTrackers. Previously they 
did the versioning and TTL was handled outside.

What is still open what to do with Store.getKeyAtOrBefore(...)


This addresses bug HBASE-4071.
https://issues.apache.org/jira/browse/HBASE-4071


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1158860 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

See TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-17 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086804#comment-13086804
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-18 05:01:59.054304)


Review request for Ian Varley.


Summary
---

A min versions coupled with TTL.

Note that the I unified the GC logic inside the ColumnTrackers. Previously they 
did the versioning and TTL was handled outside.

What is still open what to do with Store.getKeyAtOrBefore(...)


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1158860 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

See TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-17 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086808#comment-13086808
 ] 

jirapos...@reviews.apache.org commented on HBASE-4071:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/
---

(Updated 2011-08-18 05:07:35.742598)


Review request for Ian Varley.


Summary
---

A min versions coupled with TTL.

Note that the I unified the GC logic inside the ColumnTrackers. Previously they 
did the versioning and TTL was handled outside.

What is still open what to do with Store.getKeyAtOrBefore(...)


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1158860 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1158860 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1158860 

Diff: https://reviews.apache.org/r/1582/diff


Testing
---

See TestMinVersions.


Thanks,

Lars



 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085853#comment-13085853
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Thanks Stack.

re if (currentCount = minVersions): in this case the count was already 
incremented by the maxversions check (which is just outside of the context 
provided by the diff).

re expunging expired rows on flush: Currently it seems this is done as an 
optimization.
The reasons why believe this cannot be done if minversions0 are: (1) The cache 
might not have all version, so I cannot count the versions to determine the 
cut-off point. (2) Even if we have minversions=1, there is no guarantee that 
the versions in the cache include the latest one (puts could have been 
backdated).
In both cases I think only at compaction time do we have enough information to 
remove expired cells (if minversions is 0).

re comments: Of course. That's why there is version control :)
I just left these in as comments to have a place where I can put a comment as 
to why I believe these are not necessary.

So you think test coverage of the existing functionality is sufficient? That is 
very good to know.
I'll add tests for the new functionality.

What's the general feeling? Should I aim for minimal intrusion or attempt to do 
a bit refactoring to abstract these policies into an interface? Leaning towards 
the latter, but on the other hand the change would be more risky.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085889#comment-13085889
 ] 

stack commented on HBASE-4071:
--

bq. In both cases I think only at compaction time do we have enough information 
to remove expired cells (if minversions is 0).

Fair enough.

bq. So you think test coverage of the existing functionality is sufficient? 
That is very good to know.

IIRC, there are some decent tests for this stuff.  Also, the operation is so 
fundamental that if broke, it'd bubble up as a broken test somewhere (the 
connection back down into this code might be an indirection on top of an 
indirection, but a test will break I'd say).

bq. What's the general feeling? Should I aim for minimal intrusion or attempt 
to do a bit refactoring to abstract these policies into an interface? Leaning 
towards the latter, but on the other hand the change would be more risky.

I'm for the latter.  We're trying to push out next major revision of hbase.  
It'll get some scrutiny and testing so if you've broken something it should 
show (or if not, our coverage needs improving).

Good stuff LarsH.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085175#comment-13085175
 ] 

Todd Lipcon commented on HBASE-4071:


It seems OK to hardcode this, but if it's easy to refactor this even to an 
internal (non-coprocessor non-loadable) interface, that would be cool.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085518#comment-13085518
 ] 

Lars Hofhansl commented on HBASE-4071:
--

We could have a CompactionPolicy (or something similarly named) interface 
implemented in each Store. I need to spend some time looking at all the various 
cases to see how to abstract this.

In the test-patch above, did I miss any part that would need to be touched by 
this?


 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085541#comment-13085541
 ] 

stack commented on HBASE-4071:
--

@LarsH

Patch looks great.  Minor intrusion.  I'd say that best way to build confidence 
that all works as it did is to run all unit tests -- IIRC we should have 
coverage enough to notice if you've broken the default behavior -- and then for 
your additions, work up a few unit tests that exercise the new functionality 
(This will also uncover any other code your patch should have touched -- I 
don't think you've missed anything since same Scan code is used compacting 
this you might want to verify, that compacting we do right thing, especially 
major vs minor around this new minimum versions config).

Is this right?

+  if (currentCount = minVersions)

The other two instances preincrement currentCount.  This instance doesn't.  Is 
there a reason (If so, probably deserves a comment).  Looks too like a bit of 
repeated code here that perhaps could be factored out to a method of its own?

This is sketch code but FYI, around these parts folks remove code rather than 
comment it out...

On 1. above, 'Expired rows can no longer be expunged...', why not?  Are we 
doing this now?  (I don't remember) If so, why can't we do this still?

On 2., it don't look too bad to me

On 7., yes.

Good on you LarsH.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack
 Attachments: MinVersions.diff


 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084792#comment-13084792
 ] 

Andrew Purtell commented on HBASE-4071:
---

TTLs were introduced to conveniently expire transient data and garbage collect 
it. Setting a TTL on a column family defines that column family as transient. 
Whatever changes are contemplated to be hard coded, this use case must be 
maintained. I want to be able to set a TTL and versions to 1 or 3 or 10 or 
whatever and have ALL of the versions discarded when their timestamps indicate 
expiry.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084800#comment-13084800
 ] 

Lars Hofhansl commented on HBASE-4071:
--

Of course you're right Andrew. We shouldn't mix usecases together.
What do you think of the minversions idea?
As long as minversions for a CF is 0 the behavior would be identical to the 
current 
behavior.
Also maybe what we have in mind is not a common scenario after all, and we 
shouldn't
complicate the core codebase with that.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084872#comment-13084872
 ] 

Andrew Purtell commented on HBASE-4071:
---

bq. What do you think of the minversions idea?

I think a new minversions in core is a fine idea. We can document how 
minversions, maxversions, and TTL settings interact with each other simply 
enough.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-13 Thread Ian Varley (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084778#comment-13084778
 ] 

Ian Varley commented on HBASE-4071:
---

I like the idea of making this pluggable (via the coprocessor framework, or 
otherwise). But I also think this is a fundamental enough policy option that 
making it hard-coded might be a good idea. When I was talking to someone the 
other day about the current TTL policy, he was like, WTF, who would want that, 
it eats your data?. There's no such thing as a keep 0 versions option, and 
thus no way to accidentally lose your most current data using that approach. 
But with the TTL version there is, which is (IMO) counter-intuitive for those 
coming from an RDBMS background.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-08-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084782#comment-13084782
 ] 

Lars Hofhansl commented on HBASE-4071:
--

I also think a simple notion like keep everything more recent than T, but at 
least N versions seems to be a frequent enough option to be hard-coded.
Maybe that could be indicated simply by setting both # of versions AND TTL on a 
column family.


 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-07-07 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061102#comment-13061102
 ] 

Jonathan Gray commented on HBASE-4071:
--

I like this idea.  It's somewhat related to an idea for a TTKAV 
(TimeToKeepAllValues) parameter that would allow a point-in-time 
SnapshotScanner.  See HBASE-2376

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-07-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061107#comment-13061107
 ] 

Todd Lipcon commented on HBASE-4071:


Rather than hard-code these various policies, it would be very nice if the 
compaction dropping policy were pluggable. I've often had people request the 
ability to do something like check the value of a column, and change TTL based 
on that value. Or, given that we have an entire row available when doing a 
major compaction, it would be possible to implement a feature like putting an 
expiration time as the value of one column, and having it expire the entire 
row's worth at that timestamp.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-07-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061402#comment-13061402
 ] 

Todd Lipcon commented on HBASE-4071:


Sure, could be another type of coprocessor. Certainly would be nice to 
dynamically load them and share the other CP infrastructure.

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira