Re: Nondeterministic outcome based on cell TTL and major compaction event order

Michael Segel Mon, 20 Apr 2015 10:13:31 -0700

Yes, how you handle versioning, it will be the same regardless of the number of 
cell versions. (Pop from the top, drop from the bottom.)


What I think would help is to define the expected outcome or contract. 

The code should be deterministic. 

That is to say, the action of writing a cell with a TTL that isn’t set to Max 
Long Value, should be the same each time. 

Then the issue of compaction should also be deterministic. 

There should be a small finite number of permutations based on conditional 
statements within the code and a single entry and exit point to the method 
within the class. 
(This gets into a coding practice that unless you’re throwing an exception, you 
only have one return from the code.) [Note: Its been 20 + years since I read 
Kernighan and Plauger … ] 

The idea is that we should define the outcomes and then code to it. 

So you can define a set of scenarios, and the defined outcome. 
Then take those defined scenarios / outcomes and consider compactions occuring 
at times during the ingestion process and then decay. 
Again its my understanding that the desired outcome is that if the cell decays, 
only that cell is affected, and will be inert (non returned ) if it exists 
until a major compaction removes it all together? 

Is this not the case? 

-Mike



> On Apr 19, 2015, at 10:10 PM, Anoop John <anoop.hb...@gmail.com> wrote:
> 
> Interested example for cell level TTL Michael.
> But one thing I want to say.  In the above example, the versions for the
> corresponding CF should have been >1.    In such case there wont be issue
> with major compaction right?
> When versions =1 yes, it will  give non deterministic results.
> 
> -Anoop-
> 
> 
> On Sun, Apr 19, 2015 at 6:59 PM, Michael Segel <michael_se...@hotmail.com>
> wrote:
> 
>> Actually I just thought of a better example…
>> 
>> Credit Card Fraud detection.
>> Imagine you’re being sent to work on a project out of the country.
>> So suppose I head over across the pond and invaded Europe. ;-P
>> 
>> I would want the credit card companies to not weigh a foreign transaction
>> heavily when determining fraud, so that if they know my location is in
>> London, then spending $$ on a dinner in London is not fraud.
>> 
>> So I call ahead and tell my bank I’m going to be in Europe for XXX months..
>> 
>> 
>>> 
>>> As to why you would want to TTL on a column that doesn’t always use a
>> TTL?
>>> 
>>> I used this example in a different post…
>>> 
>>> Imagine you have a road link which has an attribute of speed.
>>> 
>>> You could have construction, or variable speed limits.
>>> So you would want to change the speed limit with a TTL.
>>> 
>>> Or you’re a retailer and you’re offering a 20% discount on a product for
>> a limited time only?
>>> 
>>> Sure, these are bad examples because in reality the database is a sync
>> and the application would manage these type of issues.
>>> 
>>> 
>>>> On Apr 18, 2015, at 12:23 AM, lars hofhansl <la...@apache.org> wrote:
>>>> 
>>>> The formatting did not come out right. Lemme try again...
>>>> 
>>>> 
>>>> Just came here to say that. From our (maybe not clearly enough) defined
>> semantics this how it should behave.
>>>> 
>>>> It _is_ confusing, though, since compactions are - in a sense - just
>> optimizations that run in the background to prevent the number of HFiles to
>> be unbounded.
>>>> In this case the schedule of the compactions influences the outcome.
>>>> 
>>>> Note that even tombstone markers can be confusing. Here's another
>> confusing example:
>>>> 1. delete (r1, f1, q1, T2)
>>>> 2. put (r1, f1, q1, v1, T1)
>>>> 
>>>> If a compaction happens after #1 but before #2 the put will remain:
>>>> delete
>>>> compaction
>>>> put (remains visible)
>>>> 
>>>> If the compaction happens after #2 the put will be affected by the
>> delete and hence removed:
>>>> delete
>>>> put
>>>> compaction (will remove the put)
>>>> 
>>>> Notice though that both of these examples _are_ a bit weird.
>>>> Why would only a newer version of the cell have a TTL?
>>>> Why would you date a delete into the future?
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> From: lars hofhansl <la...@apache.org>
>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
>>>> Sent: Friday, April 17, 2015 10:18 PM
>>>> Subject: Re: Nondeterministic outcome based on cell TTL and major
>> compaction event order
>>>> 
>>>> 
>>>> Just came here to say that. From our (maybe not clearly enough) defined
>> semantics this how it should behave.
>>>> 
>>>> It _is_ confusing, though, since compactions are - in a sense - just
>> optimizations that run in the background to prevent the number of HFiles to
>> be unbounded.In this case the schedule of the compactions influences the
>> outcome.
>>>> Note that even tombstone markers can be confusing. Here's another
>> confusing example:1. delete (r1, f1, q1, T2)2. put (r1, f1, q1, v1, T1)
>>>> If a compaction happens after #1 but before #2 the put will
>> remain:deletecompactionput (remains visible)
>>>> 
>>>> If the compaction happens after #2 the put will be affected by the
>> delete and hence removed.deleteputcompaction (will remove the put)
>>>> 
>>>> Notice though that both of these examples _are_ a bit weird.Why would
>> only a newer version of the cell have a TTL?Why would you date a delete
>> into the future?
>>>> -- Lars
>>>> 
>>>>    From: Sean Busbey <bus...@cloudera.com>
>>>> 
>>>> 
>>>> 
>>>> To: dev <dev@hbase.apache.org>
>>>> Sent: Friday, April 17, 2015 4:52 PM
>>>> Subject: Re: Nondeterministic outcome based on cell TTL and major
>> compaction event order
>>>> 
>>>> If you have max versions set to 1 (the default), then c1 should be
>> removed
>>>> at compaction time if c2 still exists then.
>>>> 
>>>> --
>>>> Sean
>>>> 
>>>> 
>>>> On Apr 17, 2015 6:41 PM, "Michael Segel" <michael_se...@hotmail.com>
>> wrote:
>>>> 
>>>>> Ok,
>>>>> So then if you have a previous cell (c1) and you insert a new cell c2
>> that
>>>>> has a TTL of lets say 5 mins, then c1 should always exist?
>>>>> That is my understanding but from Cosmin’s post, he’s saying its
>>>>> different.  And that’s why I don’t understand.  You couldn’t lose the
>> cell
>>>>> c1 at all.
>>>>> Compaction or no compaction.
>>>>> 
>>>>> That’s why I’m confused.  Current behavior doesn’t match the expected
>>>>> contract.
>>>>> 
>>>>> -Mike
>>>>> 
>>>>>> On Apr 17, 2015, at 4:37 PM, Andrew Purtell <apurt...@apache.org>
>> wrote:
>>>>>> 
>>>>>> The way TTLs work today is they define the interval of time a cell
>>>>>> exists - exactly as that. There is no tombstone laid like a normal
>>>>>> delete. Once the TTL elapses the cell just ceases to exist to normal
>>>>>> scanners. The interaction of expired cells, multiple versions, minimum
>>>>>> versions, raw scanners, etc. can be confusing. We can absolutely
>>>>>> revisit this.
>>>>>> 
>>>>>> A cell with an expired TTL could be treated as the combination of
>>>>>> tombstone and the most recent value it lays over. This is not how the
>>>>>> implementation works today, but could be changed for an upcoming major
>>>>>> version like 2.0 if there's consensus to do it.
>>>>>> 
>>>>>> 
>>>>>>> On Apr 10, 2015, at 7:26 AM, Cosmin Lehene <cleh...@adobe.com>
>> wrote:
>>>>>>> 
>>>>>>> I've been initially puzzled by this, although I realize how it's
>> likely
>>>>> as designed.
>>>>>>> 
>>>>>>> 
>>>>>>> The cell TTL expiration and compactions events can lead to either
>> some
>>>>> (the older) data left or no data at all for a particular  (row, family,
>>>>> qualifier, ts) coordinate.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Write (r1, f1, q1, v1, 1)
>>>>>>> 
>>>>>>> Write (r1, f1, q1, v1, 2) - TTL=1 minute
>>>>>>> 
>>>>>>> 
>>>>>>> Scenario 1:
>>>>>>> 
>>>>>>> 
>>>>>>> If a major compaction happens within a minute
>>>>>>> 
>>>>>>> 
>>>>>>> it will remove (r1, f1, q1, v1, 1)
>>>>>>> 
>>>>>>> then after a minute (r1, f1, q1, v1, 2) will expire
>>>>>>> 
>>>>>>> no data left
>>>>>>> 
>>>>>>> 
>>>>>>> Scenario 2:
>>>>>>> 
>>>>>>> 
>>>>>>> A minute passes
>>>>>>> 
>>>>>>> (r1, f1, q1, v1, 2) expires
>>>>>>> 
>>>>>>> Compaction runs..
>>>>>>> 
>>>>>>> (r1, f1, q1, v1, 1) remains
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> This seems, by and large expected behavior, but it still seems
>>>>> "uncomfortable" that the (overall) outcome is not decided by me, but
>> by a
>>>>> chance of event ordering.
>>>>>>> 
>>>>>>> 
>>>>>>> I wonder we'd want this to behave differently (perhaps it has been
>>>>> discussed already), but if not, it's worth a more detailed
>> documentation in
>>>>> the book.
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> 
>>>>>>> Cosmin
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> 
>>>>>> - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>>>> Hein (via Tom White)
>>>>>> 
>>>>> 
>>>>> The opinions expressed here are mine, while they may reflect a
>> cognitive
>>>>> thought, that is purely accidental.
>>>>> Use at your own risk.
>>>>> Michael Segel
>>>>> michael_segel (AT) hotmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Nondeterministic outcome based on cell TTL and major compaction event order

Reply via email to