Re: Nondeterministic outcome based on cell TTL and major compaction event order

Michael Segel Sun, 19 Apr 2015 06:30:45 -0700

Actually I just thought of a better example… 

Credit Card Fraud detection. 
Imagine you’re being sent to work on a project out of the country. 
So suppose I head over across the pond and invaded Europe. ;-P


I would want the credit card companies to not weigh a foreign transaction 
heavily when determining fraud, so that if they know my location is in London, 
then spending $$ on a dinner in London is not fraud. 

So I call ahead and tell my bank I’m going to be in Europe for XXX months..  


> 
> As to why you would want to TTL on a column that doesn’t always use a TTL? 
> 
> I used this example in a different post…
> 
> Imagine you have a road link which has an attribute of speed. 
> 
> You could have construction, or variable speed limits. 
> So you would want to change the speed limit with a TTL.  
> 
> Or you’re a retailer and you’re offering a 20% discount on a product for a 
> limited time only? 
> 
> Sure, these are bad examples because in reality the database is a sync and 
> the application would manage these type of issues.
> 
> 
>> On Apr 18, 2015, at 12:23 AM, lars hofhansl <[email protected]> wrote:
>> 
>> The formatting did not come out right. Lemme try again...
>> 
>> 
>> Just came here to say that. From our (maybe not clearly enough) defined 
>> semantics this how it should behave.
>> 
>> It _is_ confusing, though, since compactions are - in a sense - just 
>> optimizations that run in the background to prevent the number of HFiles to 
>> be unbounded.
>> In this case the schedule of the compactions influences the outcome.
>> 
>> Note that even tombstone markers can be confusing. Here's another confusing 
>> example:
>> 1. delete (r1, f1, q1, T2)
>> 2. put (r1, f1, q1, v1, T1)
>> 
>> If a compaction happens after #1 but before #2 the put will remain:
>> delete
>> compaction
>> put (remains visible)
>> 
>> If the compaction happens after #2 the put will be affected by the delete 
>> and hence removed:
>> delete
>> put
>> compaction (will remove the put)
>> 
>> Notice though that both of these examples _are_ a bit weird.
>> Why would only a newer version of the cell have a TTL?
>> Why would you date a delete into the future?
>> 
>> -- Lars
>> 
>> 
>> 
>> 
>> ________________________________
>> From: lars hofhansl <[email protected]>
>> To: "[email protected]" <[email protected]> 
>> Sent: Friday, April 17, 2015 10:18 PM
>> Subject: Re: Nondeterministic outcome based on cell TTL and major compaction 
>> event order
>> 
>> 
>> Just came here to say that. From our (maybe not clearly enough) defined 
>> semantics this how it should behave.
>> 
>> It _is_ confusing, though, since compactions are - in a sense - just 
>> optimizations that run in the background to prevent the number of HFiles to 
>> be unbounded.In this case the schedule of the compactions influences the 
>> outcome.
>> Note that even tombstone markers can be confusing. Here's another confusing 
>> example:1. delete (r1, f1, q1, T2)2. put (r1, f1, q1, v1, T1)
>> If a compaction happens after #1 but before #2 the put will 
>> remain:deletecompactionput (remains visible)
>> 
>> If the compaction happens after #2 the put will be affected by the delete 
>> and hence removed.deleteputcompaction (will remove the put)
>> 
>> Notice though that both of these examples _are_ a bit weird.Why would only a 
>> newer version of the cell have a TTL?Why would you date a delete into the 
>> future?
>> -- Lars
>> 
>>     From: Sean Busbey <[email protected]>
>> 
>> 
>> 
>> To: dev <[email protected]> 
>> Sent: Friday, April 17, 2015 4:52 PM
>> Subject: Re: Nondeterministic outcome based on cell TTL and major compaction 
>> event order
>> 
>> If you have max versions set to 1 (the default), then c1 should be removed
>> at compaction time if c2 still exists then.
>> 
>> -- 
>> Sean
>> 
>> 
>> On Apr 17, 2015 6:41 PM, "Michael Segel" <[email protected]> wrote:
>> 
>>> Ok,
>>> So then if you have a previous cell (c1) and you insert a new cell c2 that
>>> has a TTL of lets say 5 mins, then c1 should always exist?
>>> That is my understanding but from Cosmin’s post, he’s saying its
>>> different.  And that’s why I don’t understand.  You couldn’t lose the cell
>>> c1 at all.
>>> Compaction or no compaction.
>>> 
>>> That’s why I’m confused.  Current behavior doesn’t match the expected
>>> contract.
>>> 
>>> -Mike
>>> 
>>>> On Apr 17, 2015, at 4:37 PM, Andrew Purtell <[email protected]> wrote:
>>>> 
>>>> The way TTLs work today is they define the interval of time a cell
>>>> exists - exactly as that. There is no tombstone laid like a normal
>>>> delete. Once the TTL elapses the cell just ceases to exist to normal
>>>> scanners. The interaction of expired cells, multiple versions, minimum
>>>> versions, raw scanners, etc. can be confusing. We can absolutely
>>>> revisit this.
>>>> 
>>>> A cell with an expired TTL could be treated as the combination of
>>>> tombstone and the most recent value it lays over. This is not how the
>>>> implementation works today, but could be changed for an upcoming major
>>>> version like 2.0 if there's consensus to do it.
>>>> 
>>>> 
>>>>> On Apr 10, 2015, at 7:26 AM, Cosmin Lehene <[email protected]> wrote:
>>>>> 
>>>>> I've been initially puzzled by this, although I realize how it's likely
>>> as designed.
>>>>> 
>>>>> 
>>>>> The cell TTL expiration and compactions events can lead to either some
>>> (the older) data left or no data at all for a particular  (row, family,
>>> qualifier, ts) coordinate.
>>>>> 
>>>>> 
>>>>> 
>>>>> Write (r1, f1, q1, v1, 1)
>>>>> 
>>>>> Write (r1, f1, q1, v1, 2) - TTL=1 minute
>>>>> 
>>>>> 
>>>>> Scenario 1:
>>>>> 
>>>>> 
>>>>> If a major compaction happens within a minute
>>>>> 
>>>>> 
>>>>> it will remove (r1, f1, q1, v1, 1)
>>>>> 
>>>>> then after a minute (r1, f1, q1, v1, 2) will expire
>>>>> 
>>>>> no data left
>>>>> 
>>>>> 
>>>>> Scenario 2:
>>>>> 
>>>>> 
>>>>> A minute passes
>>>>> 
>>>>> (r1, f1, q1, v1, 2) expires
>>>>> 
>>>>> Compaction runs..
>>>>> 
>>>>> (r1, f1, q1, v1, 1) remains
>>>>> 
>>>>> 
>>>>> 
>>>>> This seems, by and large expected behavior, but it still seems
>>> "uncomfortable" that the (overall) outcome is not decided by me, but by a
>>> chance of event ordering.
>>>>> 
>>>>> 
>>>>> I wonder we'd want this to behave differently (perhaps it has been
>>> discussed already), but if not, it's worth a more detailed documentation in
>>> the book.
>>>>> 
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> 
>>>>> Cosmin
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>> - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>> Hein (via Tom White)
>>>> 
>>> 
>>> The opinions expressed here are mine, while they may reflect a cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> The opinions expressed here are mine, while they may reflect a cognitive 
> thought, that is purely accidental. 
> Use at your own risk. 
> Michael Segel
> michael_segel (AT) hotmail.com
> 
> 
> 
> 
> 
> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Nondeterministic outcome based on cell TTL and major compaction event order

Reply via email to