The ambiguity seems to lie at the intersection of TTL and version "garbage 
collection" during compactions. 

Major compactions can lead to nondeterministic results when multiple versions 
are involved (slightly captured in the book 
http://hbase.apache.org/book.html#versions and 
http://www.ngdata.com/bending-time-in-hbase/ )
TTL expirations don't result in deletes (at least not in the classical sense 
with a tombstone).

Cosmin 

_______________________________________
From: Michael Segel <michael_se...@hotmail.com>
Sent: Friday, April 10, 2015 8:35 AM
To: dev@hbase.apache.org
Subject: Re: Nondeterministic outcome based on cell TTL and major compaction 
event order

Interesting.
There seems to be some ambiguity in what happens between a TTL and a deletion.

Is the TTL a delete or is it a separate type of function?

That is to say when you inserted version 2 of the cell, did you intend to just 
have version 2 exist for a little while and then default to version 1 or did 
you mean that when you inserted version 2, you wanted to delete everything 
prior to version 2 and then when version 2 expires, it then goes away?

The documentation isn’t clear on this point.

To give you an example where you wouldn’t want to have the TTL on a cell also 
delete prior versions…

Suppose you’re storing map data in HBase. You have an attribute (speed) 
associated to a road link.

If the road is a 65 MPH highway, then the base speed (default speed) is 65MPH. 
However if there’s construction planned for the road then you need to reset the 
speed to 45 mph while there is construction.  You know that the construction is 
supposed to last X months, so you reset the speed limit to 45 with a TTL on 
that cell version only.

Another example is if you’re storing price for a given sku in a given region of 
your retail chain.  So you want to reduce the price by 20% for a 2 week period.
Again, you set that discount to live for 2 weeks with a TTL, then revert back 
to original price.

So I guess there should be a clarification as to what is intended for the TTL 
to do?

Does that make sense?





> On Apr 10, 2015, at 9:26 AM, Cosmin Lehene <cleh...@adobe.com> wrote:
>
> I've been initially puzzled by this, although I realize how it's likely as 
> designed.
>
>
> The cell TTL expiration and compactions events can lead to either some (the 
> older) data left or no data at all for a particular  (row, family, qualifier, 
> ts) coordinate.
>
>
>
> Write (r1, f1, q1, v1, 1)
>
> Write (r1, f1, q1, v1, 2) - TTL=1 minute
>
>
> Scenario 1:
>
>
> If a major compaction happens within a minute
>
>
> it will remove (r1, f1, q1, v1, 1)
>
> then after a minute (r1, f1, q1, v1, 2) will expire
>
> no data left
>
>
> Scenario 2:
>
>
> A minute passes
>
> (r1, f1, q1, v1, 2) expires
>
> Compaction runs..
>
> (r1, f1, q1, v1, 1) remains
>
>
>
> This seems, by and large expected behavior, but it still seems 
> "uncomfortable" that the (overall) outcome is not decided by me, but by a 
> chance of event ordering.
>
>
> I wonder we'd want this to behave differently (perhaps it has been discussed 
> already), but if not, it's worth a more detailed documentation in the book.
>
>
> What do you think?
>
>
> Cosmin
>
>
>
>

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com





Reply via email to