[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087156#comment-13087156
 ] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Again sorry for he review churn. Updates were posted even after I removed the 
jira from the review :(

Quick update...
Since there are so many actors involved in this (Store, Region, 
ScanQueryMatcher, ColumnTrackers), all with slightly different intricate logic, 
I think abstracting this out into an interface will either not make it nicer, 
or require me to rewrite the entire logic.
Instead I unified both the TTL and Versioning logic inside the ColumnTrackers, 
while still giving the trackers a chance to bail out early. That made it 
simpler, and will hopefully make it easier in the future to abstract this 
further (I think that needs to be coordinated with the Compaction Coprocessor 
work).

One problem I encountered is Store.getRowKeyAtOrBefore. That currently honors 
TTL but not MaxVersions (which is strange). I'm thinking I'll either leave that 
alone, or have it also not honor TTL when the store has minversions.

Fixing this correctly in all cases would mean to scan all relevant KVs in the 
Memstore (i.e. ignoring TTL and version restrictions), then use those 
candidates to scan the storefiles (now honoring TTL and doing the version 
counting).

Added a new test that validates the basic behavior.
Running tests now. (I seems to have a hard time to get a full test run through 
locally - with or without my patch).

Will attach a new patch soon that should still be considered a sketch but 
should hold up to a bit more scrutiny.


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able 
> to restore the dataset from any point of time but only within a limited 
> timeframe -- say one week.  Thereafter, if the versions are older than one 
> week, rather than as we do with TTL where we let go of all versions older 
> than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
> its like versions==1 when TTL > one week.  We want to allow that if an error 
> is caught within a week of its happening -- user mistakenly removes a 
> critical table -- then we'll be able to restore up the the moment just before 
> catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to