Re: [infinispan-dev] Reliability of return values

Mircea Markus Wed, 18 Jun 2014 04:36:14 -0700

On Jun 17, 2014, at 10:53, Galder Zamarreño <gal...@redhat.com> wrote:


> 
> On 16 Jun 2014, at 16:57, Dan Berindei <dan.berin...@gmail.com> wrote:
> 
>> On Thu, Jun 12, 2014 at 4:54 PM, Galder Zamarreño <gal...@redhat.com> wrote:
>> Hi all,
>> 
>> I’m working on the implementation of this, and the solution noted in the 
>> JIRA does not work for situations where you have to return a previous value 
>> that might have been overriden due to partial operation application. Example 
>> (assuming 1 owner only):
>> 
>> 1. remote client does a put(“key”, 1) in Node-A
>> 2. remote client does a replace(“key”, 2) on Node-A but the operation fails 
>> to replicate and gets partially applieed in Node-A only.
>> 3. remote client, seeing the replace failed, retries the replace(“key”, 2) 
>> in Node-B. replace underneath finds that the previous value of “key” is 2 
>> (since it got partially applied in Node-A), so it succeeds but the previous 
>> value returned is 2, which is wrong. The previous value should have been 1, 
>> but this value is gone…
>> 
>> In my view, there is two ways to solve this issue:
>> 
>> 1. Make Hot Rod caches transactional. By doign so, operations are not 
>> partially applied. They’re done fully cluster wide or they’re rolled back. 
>> I’ve verified this and the test passes once you make the cache 
>> transactional. The downside is of course performance. IOW, anyone using 
>> conditional operations, or relying on previous values, would need 
>> transactional caches. This should work well with the retry mechanism being 
>> implemented for ISPN-2956, which is still needed. A big question here is 
>> whether Hot Rod caches should be transactional by default or viceversa. If 
>> they’re transactional, our performance will go down for sure but we won’t 
>> have this type of issues (with retry in place). If you’re not transactional, 
>> you are faster but you’re exposed to these edge cases, and you need to 
>> consider them when deploying your app, something people might miss, although 
>> we could provide WARN messages when conditional operations or 
>> Flag.FORCE_RETURN_VALUE is used with non-transactional caches.
>> 
>> The same problem [1] can appear in embedded caches. So if there's an 
>> argument to make HotRod caches transactional by default, it should apply to 
>> embedded caches as well.
>> 
>> I like the idea of logging a warning when the FORCE_RETURN_VALUE or the 
>> non-versioned conditional operations are used from HotRod. But we have to be 
>> careful with the wording, because inconsistencies can appear in 
>> transactional mode as well [2].
> 
> I think this (default non-transaction, and WARN if using such methods with 
> non-transactional cache) might be the best stop gap solution.

+1. Would be good to have [2] fixed as well, in case people want a workable 
workaround.

> 
>> [1] https://issues.jboss.org/browse/ISPN-4286
>> [2] https://issues.jboss.org/browse/ISPN-3421
> 
> Have you considered other fixes for [2]?
> 
>> 
>> 
>> 2. Get rid of returning previous value in the Hot Rod protocol for modifying 
>> operations. For conditional operations, returning true/false is at least 
>> enough to see if the condition was applied. So, 
>> replaceIfUnmodified/replace/remove(conditional), they would only return 
>> true/false. This would be complicated due to reliance on Map/ConcurrentMap 
>> APIs. Maybe something to consider for when we stop relying on JDK APIs.
>> 
>> 
>> I'm not sure we can completely get rid of the return values, even though 
>> JCache doesn't extend Map it still has a getAndPut method.
> 
> It does have such method but we could potentially make it throw an exception 
> saying that is not supported.
> 
>> 
>> 
>> I also considered applying corrective actions but that’s very messy and 
>> prone to concurrency issues, so I quickly discarded that.
>> 
>> Yeah, rolling back partial modifications is definitely not an option.
>> 
>> 
>> Any other options? Thoughts on the options above?
>> 
>> I was waiting for Sanne to say this, but how about keeping a version history?
>> 
>> If we had the chain of values/versions for each key, we could look up the 
>> version of the current operation in the chain when retrying. If it's there, 
>> we could infer that the operation was already applied, and return the 
>> previous value in the chain as the previous version.
>> 
>> Of course, there are the usual downsides:
>> 1. Maintaining the history might be tricky, especially around state 
>> transfers.
>> 2. Performance will also be affected, maybe getting closer to the tx 
>> performance.
> 
> Yeah, keeping version history could help, but it’d be quite a beast to 
> implement for 7.0, including garbage collection of old versions...etc.

Yes, also option 1 seems to make things work and is almost ready. Also with 
versioning, I think the performance is worse, as you always need to communicate 
when you no longer need a version(RPC), so that it can be garbage collected. 

> 
>> 
>> 
>> Cheers,
>> 
>> On 26 May 2014, at 18:11, Galder Zamarreño <gal...@redhat.com> wrote:
>> 
>>> Hi all,
>>> 
>>> I’ve been looking into ISPN-2956 last week and I think we have a solution 
>>> for it which requires a protocol change [1]
>>> 
>>> Since we’re in the middle of the Hot Rod 2.0 development, this is a good 
>>> opportunity to implement it.
>>> 
>>> Cheers,
>>> 
>>> [1] 
>>> https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12970541
>>> 
>>> On 14 May 2014, at 09:36, Dan Berindei <dan.berin...@gmail.com> wrote:
>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rva...@redhat.com> wrote:
>>>> On 05/13/2014 03:58 PM, Dan Berindei wrote:
>>>>> 
>>>>> 
>>>>> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rva...@redhat.com> wrote:
>>>>> @Dan: It's absolutely correct to do the further writes in order to make
>>>>> the cache consistent, I am not arguing against that. You've fixed the
>>>>> outcome (state of cache) well. My point was that we should let the user
>>>>> know that the value he gets is not 100% correct when we already know
>>>>> that - and given the API, the only option to do that seems to me as
>>>>> throwing an exception.
>>>>> 
>>>>> The problem, as I see it, is that users also expect methods that throw an 
>>>>> exception to *not* modify the cache.
>>>>> So we would break some of the users' expectations anyway.
>>>> 
>>>> When the response from primary owner does not arrive soon, we throw 
>>>> timeout exception and the cache is modified anyway, isn't it?
>>>> If we throw ~ReturnValueUnreliableException, the user has at least some 
>>>> chance to react. Currently, for code requiring 100% reliable value, you 
>>>> can't do anything but ignore the return value, even for CAS operations.
>>>> 
>>>> 
>>>> Yes, but we don't expect the user to handle a TimeoutException in any 
>>>> meaningful way. Instead, we expect the user to choose his hardware and 
>>>> configuration to avoid timeouts, if he cares about consistency. How could 
>>>> you handle an exception that tells you "I may have written the value you 
>>>> asked me to in the cache, or maybe not. Either way, you will never know 
>>>> what the previous value was. Muahahaha!" in an application that cares 
>>>> about consistency?
>>>> 
>>>> But the proposed ReturnValueUnreliableException can't be avoided by the 
>>>> user, it has to be handled every time the cluster membership changes. So 
>>>> it would be more like WriteSkewException than TimeoutException. And when 
>>>> we throw a WriteSkewException, we don't write anything to the cache.
>>>> 
>>>> Remember, most users do not care about the previous value at all - that's 
>>>> the reason why JCache and our HotRod client don't return the previous 
>>>> value by default. Those that do care about the previous value, use the 
>>>> conditional write operations, and those already work (well, except for the 
>>>> scenario below). So you would force everyone to handle an exception that 
>>>> they don't care about.
>>>> 
>>>> It would make sense to throw an exception if we didn't return the previous 
>>>> value by default, and the user requested the return value explicitly. But 
>>>> we do return the value by default, so I don't think it would be a good 
>>>> idea for us.
>>>> 
>>>>> 
>>>>> 
>>>>> @Sanne: I was not suggesting that for now - sure, value versioning is (I
>>>>> hope) on the roadmap. But that's more complicated, I though just about
>>>>> making an adjustment to the current implementation.
>>>>> 
>>>>> 
>>>>> Actually, just keeping a history of values would not fix the the return 
>>>>> value in all cases.
>>>>> 
>>>>> When retrying a put on the new primary owner, the primary owner would 
>>>>> still have to compare our value with the latest value, and return the 
>>>>> previous value if they are equal. So we could have something like this:
>>>>> 
>>>>> A is the originator, B is the primary owner, k = v0
>>>>> A -> B: put(k, v1)
>>>>> B dies before writing v, C is now primary owner
>>>>> D -> C: put(k, v1) // another put operation from D, with the same value
>>>>> C -> D: null
>>>>> A -> C: retry_put(k, v1)
>>>>> C -> A: v0 // C assumes A is overwriting its own value, so it's returning 
>>>>> the previous one
>>>>> 
>>>>> To fix that, we'd need a unique version generated by the originator - 
>>>>> kind of like a transaction id ;)
>>>> 
>>>> Is it such a problem to associate unique ID with each write? History 
>>>> implementation seems to me like the more complicated part.
>>>> 
>>>> I also think maintaining a version history would be quite complicated, and 
>>>> it also would make it harder for users to estimate their cache's memory 
>>>> usage. That's why I was trying to show that it's not a panacea.
>>>> 
>>>> 
>>>> 
>>>>> And to fix the HotRod use case, the HotRod client would have to be the 
>>>>> one generating the version.
>>>> 
>>>> I agree.
>>>> 
>>>> Radim
>>>> 
>>>> 
>>>>> 
>>>>> Cheers
>>>>> Dan
>>>>> 
>>>>> 
>>>>> 
>>>>> Radim
>>>>> 
>>>>> On 05/12/2014 12:02 PM, Sanne Grinovero wrote:
>>>>>> I don't think we are in a position to decide what is a reasonable
>>>>>> compromise; we can do better.
>>>>>> For example - as Radim suggested - it might seem reasonable to have
>>>>>> the older value around for a little while. We'll need a little bit of
>>>>>> history of values and tombstones anyway for many other reasons.
>>>>>> 
>>>>>> 
>>>>>> Sanne
>>>>>> 
>>>>>> On 12 May 2014 09:37, Dan Berindei <dan.berin...@gmail.com> wrote:
>>>>>>> Radim, I would contend that the first and foremost guarantee that put()
>>>>>>> makes is to leave the cache in a consistent state. So we can't just 
>>>>>>> throw an
>>>>>>> exception and give up, leaving k=v on one owner and k=null on another.
>>>>>>> 
>>>>>>> Secondly, put(k, v) being atomic means that it either succeeds, it 
>>>>>>> writes
>>>>>>> k=v in the cache, and it returns the previous value, or it doesn't 
>>>>>>> succeed,
>>>>>>> and it doesn't write k=v in the cache. Returning the wrong previous 
>>>>>>> value is
>>>>>>> bad, but leaving k=v in the cache is just as bad, even if the all the 
>>>>>>> owners
>>>>>>> have the same value.
>>>>>>> 
>>>>>>> And last, we can't have one node seeing k=null, then k=v, then k=null 
>>>>>>> again,
>>>>>>> when the only write we did on the cache was a put(k, v). So trying to 
>>>>>>> undo
>>>>>>> the write would not help.
>>>>>>> 
>>>>>>> In the end, we have to make a compromise, and I think returning the 
>>>>>>> wrong
>>>>>>> value in some of the cases is a reasonable compromise. Of course, we 
>>>>>>> should
>>>>>>> document that :)
>>>>>>> 
>>>>>>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like
>>>>>>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the 
>>>>>>> HotRod
>>>>>>> protocol and to the cache itself.
>>>>>>> 
>>>>>>> Incidentally, transactional caches have a similar problem when the
>>>>>>> originator leaves the cluster: ISPN-3421 [1]
>>>>>>> And we can't handle transactional caches any better than 
>>>>>>> non-transactional
>>>>>>> caches until we expose transactions to the HotRod client.
>>>>>>> 
>>>>>>> [1] https://issues.jboss.org/browse/ISPN-2956
>>>>>>> 
>>>>>>> Cheers
>>>>>>> Dan
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rva...@redhat.com> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> recently I've stumbled upon one already expected behaviour (one 
>>>>>>>> instance
>>>>>>>> is [1]), but which did not got much attention.
>>>>>>>> 
>>>>>>>> In non-tx cache, when the primary owner fails after the request has 
>>>>>>>> been
>>>>>>>> replicated to backup owner, the request is retried in the new topology.
>>>>>>>> Then, the operation is executed on the new primary (the previous
>>>>>>>> backup). The outcome has been already fixed in [2], but the return 
>>>>>>>> value
>>>>>>>> may be wrong. For example, when we do a put, the return value for the
>>>>>>>> second attempt will be the currently inserted value (although the entry
>>>>>>>> was just created). Same situation may happen for other operations.
>>>>>>>> 
>>>>>>>> Currently, it's not possible to return the correct value (because it 
>>>>>>>> has
>>>>>>>> already been overwritten and we don't keep a history of values), but
>>>>>>>> shouldn't we rather throw an exception if we were not able to fulfil 
>>>>>>>> the
>>>>>>>> API contract?
>>>>>>>> 
>>>>>>>> Radim
>>>>>>>> 
>>>>>>>> [1] https://issues.jboss.org/browse/ISPN-2956
>>>>>>>> [2] https://issues.jboss.org/browse/ISPN-3422
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Radim Vansa <rva...@redhat.com>
>>>>>>>> JBoss DataGrid QA
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> infinispan-dev mailing list
>>>>>>>> infinispan-dev@lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev@lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev@lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> 
>>>>> 
>>>>> --
>>>>> Radim Vansa <rva...@redhat.com>
>>>>> JBoss DataGrid QA
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev@lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> 
>>>>> infinispan-dev@lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> 
>>>> --
>>>> Radim Vansa
>>>> <rva...@redhat.com>
>>>> 
>>>> JBoss DataGrid QA
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev@lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev@lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>>> --
>>> Galder Zamarreño
>>> gal...@redhat.com
>>> twitter.com/galderz
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarreño
>> gal...@redhat.com
>> twitter.com/galderz
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)





_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Reliability of return values

Reply via email to