On Jun 17, 2014, at 10:53, Galder Zamarreño <gal...@redhat.com> wrote:
> > On 16 Jun 2014, at 16:57, Dan Berindei <dan.berin...@gmail.com> wrote: > >> On Thu, Jun 12, 2014 at 4:54 PM, Galder Zamarreño <gal...@redhat.com> wrote: >> Hi all, >> >> I’m working on the implementation of this, and the solution noted in the >> JIRA does not work for situations where you have to return a previous value >> that might have been overriden due to partial operation application. Example >> (assuming 1 owner only): >> >> 1. remote client does a put(“key”, 1) in Node-A >> 2. remote client does a replace(“key”, 2) on Node-A but the operation fails >> to replicate and gets partially applieed in Node-A only. >> 3. remote client, seeing the replace failed, retries the replace(“key”, 2) >> in Node-B. replace underneath finds that the previous value of “key” is 2 >> (since it got partially applied in Node-A), so it succeeds but the previous >> value returned is 2, which is wrong. The previous value should have been 1, >> but this value is gone… >> >> In my view, there is two ways to solve this issue: >> >> 1. Make Hot Rod caches transactional. By doign so, operations are not >> partially applied. They’re done fully cluster wide or they’re rolled back. >> I’ve verified this and the test passes once you make the cache >> transactional. The downside is of course performance. IOW, anyone using >> conditional operations, or relying on previous values, would need >> transactional caches. This should work well with the retry mechanism being >> implemented for ISPN-2956, which is still needed. A big question here is >> whether Hot Rod caches should be transactional by default or viceversa. If >> they’re transactional, our performance will go down for sure but we won’t >> have this type of issues (with retry in place). If you’re not transactional, >> you are faster but you’re exposed to these edge cases, and you need to >> consider them when deploying your app, something people might miss, although >> we could provide WARN messages when conditional operations or >> Flag.FORCE_RETURN_VALUE is used with non-transactional caches. >> >> The same problem [1] can appear in embedded caches. So if there's an >> argument to make HotRod caches transactional by default, it should apply to >> embedded caches as well. >> >> I like the idea of logging a warning when the FORCE_RETURN_VALUE or the >> non-versioned conditional operations are used from HotRod. But we have to be >> careful with the wording, because inconsistencies can appear in >> transactional mode as well [2]. > > I think this (default non-transaction, and WARN if using such methods with > non-transactional cache) might be the best stop gap solution. +1. Would be good to have [2] fixed as well, in case people want a workable workaround. > >> [1] https://issues.jboss.org/browse/ISPN-4286 >> [2] https://issues.jboss.org/browse/ISPN-3421 > > Have you considered other fixes for [2]? > >> >> >> 2. Get rid of returning previous value in the Hot Rod protocol for modifying >> operations. For conditional operations, returning true/false is at least >> enough to see if the condition was applied. So, >> replaceIfUnmodified/replace/remove(conditional), they would only return >> true/false. This would be complicated due to reliance on Map/ConcurrentMap >> APIs. Maybe something to consider for when we stop relying on JDK APIs. >> >> >> I'm not sure we can completely get rid of the return values, even though >> JCache doesn't extend Map it still has a getAndPut method. > > It does have such method but we could potentially make it throw an exception > saying that is not supported. > >> >> >> I also considered applying corrective actions but that’s very messy and >> prone to concurrency issues, so I quickly discarded that. >> >> Yeah, rolling back partial modifications is definitely not an option. >> >> >> Any other options? Thoughts on the options above? >> >> I was waiting for Sanne to say this, but how about keeping a version history? >> >> If we had the chain of values/versions for each key, we could look up the >> version of the current operation in the chain when retrying. If it's there, >> we could infer that the operation was already applied, and return the >> previous value in the chain as the previous version. >> >> Of course, there are the usual downsides: >> 1. Maintaining the history might be tricky, especially around state >> transfers. >> 2. Performance will also be affected, maybe getting closer to the tx >> performance. > > Yeah, keeping version history could help, but it’d be quite a beast to > implement for 7.0, including garbage collection of old versions...etc. Yes, also option 1 seems to make things work and is almost ready. Also with versioning, I think the performance is worse, as you always need to communicate when you no longer need a version(RPC), so that it can be garbage collected. > >> >> >> Cheers, >> >> On 26 May 2014, at 18:11, Galder Zamarreño <gal...@redhat.com> wrote: >> >>> Hi all, >>> >>> I’ve been looking into ISPN-2956 last week and I think we have a solution >>> for it which requires a protocol change [1] >>> >>> Since we’re in the middle of the Hot Rod 2.0 development, this is a good >>> opportunity to implement it. >>> >>> Cheers, >>> >>> [1] >>> https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12970541 >>> >>> On 14 May 2014, at 09:36, Dan Berindei <dan.berin...@gmail.com> wrote: >>> >>>> >>>> >>>> >>>> On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rva...@redhat.com> wrote: >>>> On 05/13/2014 03:58 PM, Dan Berindei wrote: >>>>> >>>>> >>>>> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rva...@redhat.com> wrote: >>>>> @Dan: It's absolutely correct to do the further writes in order to make >>>>> the cache consistent, I am not arguing against that. You've fixed the >>>>> outcome (state of cache) well. My point was that we should let the user >>>>> know that the value he gets is not 100% correct when we already know >>>>> that - and given the API, the only option to do that seems to me as >>>>> throwing an exception. >>>>> >>>>> The problem, as I see it, is that users also expect methods that throw an >>>>> exception to *not* modify the cache. >>>>> So we would break some of the users' expectations anyway. >>>> >>>> When the response from primary owner does not arrive soon, we throw >>>> timeout exception and the cache is modified anyway, isn't it? >>>> If we throw ~ReturnValueUnreliableException, the user has at least some >>>> chance to react. Currently, for code requiring 100% reliable value, you >>>> can't do anything but ignore the return value, even for CAS operations. >>>> >>>> >>>> Yes, but we don't expect the user to handle a TimeoutException in any >>>> meaningful way. Instead, we expect the user to choose his hardware and >>>> configuration to avoid timeouts, if he cares about consistency. How could >>>> you handle an exception that tells you "I may have written the value you >>>> asked me to in the cache, or maybe not. Either way, you will never know >>>> what the previous value was. Muahahaha!" in an application that cares >>>> about consistency? >>>> >>>> But the proposed ReturnValueUnreliableException can't be avoided by the >>>> user, it has to be handled every time the cluster membership changes. So >>>> it would be more like WriteSkewException than TimeoutException. And when >>>> we throw a WriteSkewException, we don't write anything to the cache. >>>> >>>> Remember, most users do not care about the previous value at all - that's >>>> the reason why JCache and our HotRod client don't return the previous >>>> value by default. Those that do care about the previous value, use the >>>> conditional write operations, and those already work (well, except for the >>>> scenario below). So you would force everyone to handle an exception that >>>> they don't care about. >>>> >>>> It would make sense to throw an exception if we didn't return the previous >>>> value by default, and the user requested the return value explicitly. But >>>> we do return the value by default, so I don't think it would be a good >>>> idea for us. >>>> >>>>> >>>>> >>>>> @Sanne: I was not suggesting that for now - sure, value versioning is (I >>>>> hope) on the roadmap. But that's more complicated, I though just about >>>>> making an adjustment to the current implementation. >>>>> >>>>> >>>>> Actually, just keeping a history of values would not fix the the return >>>>> value in all cases. >>>>> >>>>> When retrying a put on the new primary owner, the primary owner would >>>>> still have to compare our value with the latest value, and return the >>>>> previous value if they are equal. So we could have something like this: >>>>> >>>>> A is the originator, B is the primary owner, k = v0 >>>>> A -> B: put(k, v1) >>>>> B dies before writing v, C is now primary owner >>>>> D -> C: put(k, v1) // another put operation from D, with the same value >>>>> C -> D: null >>>>> A -> C: retry_put(k, v1) >>>>> C -> A: v0 // C assumes A is overwriting its own value, so it's returning >>>>> the previous one >>>>> >>>>> To fix that, we'd need a unique version generated by the originator - >>>>> kind of like a transaction id ;) >>>> >>>> Is it such a problem to associate unique ID with each write? History >>>> implementation seems to me like the more complicated part. >>>> >>>> I also think maintaining a version history would be quite complicated, and >>>> it also would make it harder for users to estimate their cache's memory >>>> usage. That's why I was trying to show that it's not a panacea. >>>> >>>> >>>> >>>>> And to fix the HotRod use case, the HotRod client would have to be the >>>>> one generating the version. >>>> >>>> I agree. >>>> >>>> Radim >>>> >>>> >>>>> >>>>> Cheers >>>>> Dan >>>>> >>>>> >>>>> >>>>> Radim >>>>> >>>>> On 05/12/2014 12:02 PM, Sanne Grinovero wrote: >>>>>> I don't think we are in a position to decide what is a reasonable >>>>>> compromise; we can do better. >>>>>> For example - as Radim suggested - it might seem reasonable to have >>>>>> the older value around for a little while. We'll need a little bit of >>>>>> history of values and tombstones anyway for many other reasons. >>>>>> >>>>>> >>>>>> Sanne >>>>>> >>>>>> On 12 May 2014 09:37, Dan Berindei <dan.berin...@gmail.com> wrote: >>>>>>> Radim, I would contend that the first and foremost guarantee that put() >>>>>>> makes is to leave the cache in a consistent state. So we can't just >>>>>>> throw an >>>>>>> exception and give up, leaving k=v on one owner and k=null on another. >>>>>>> >>>>>>> Secondly, put(k, v) being atomic means that it either succeeds, it >>>>>>> writes >>>>>>> k=v in the cache, and it returns the previous value, or it doesn't >>>>>>> succeed, >>>>>>> and it doesn't write k=v in the cache. Returning the wrong previous >>>>>>> value is >>>>>>> bad, but leaving k=v in the cache is just as bad, even if the all the >>>>>>> owners >>>>>>> have the same value. >>>>>>> >>>>>>> And last, we can't have one node seeing k=null, then k=v, then k=null >>>>>>> again, >>>>>>> when the only write we did on the cache was a put(k, v). So trying to >>>>>>> undo >>>>>>> the write would not help. >>>>>>> >>>>>>> In the end, we have to make a compromise, and I think returning the >>>>>>> wrong >>>>>>> value in some of the cases is a reasonable compromise. Of course, we >>>>>>> should >>>>>>> document that :) >>>>>>> >>>>>>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >>>>>>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the >>>>>>> HotRod >>>>>>> protocol and to the cache itself. >>>>>>> >>>>>>> Incidentally, transactional caches have a similar problem when the >>>>>>> originator leaves the cluster: ISPN-3421 [1] >>>>>>> And we can't handle transactional caches any better than >>>>>>> non-transactional >>>>>>> caches until we expose transactions to the HotRod client. >>>>>>> >>>>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>>>> >>>>>>> Cheers >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rva...@redhat.com> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> recently I've stumbled upon one already expected behaviour (one >>>>>>>> instance >>>>>>>> is [1]), but which did not got much attention. >>>>>>>> >>>>>>>> In non-tx cache, when the primary owner fails after the request has >>>>>>>> been >>>>>>>> replicated to backup owner, the request is retried in the new topology. >>>>>>>> Then, the operation is executed on the new primary (the previous >>>>>>>> backup). The outcome has been already fixed in [2], but the return >>>>>>>> value >>>>>>>> may be wrong. For example, when we do a put, the return value for the >>>>>>>> second attempt will be the currently inserted value (although the entry >>>>>>>> was just created). Same situation may happen for other operations. >>>>>>>> >>>>>>>> Currently, it's not possible to return the correct value (because it >>>>>>>> has >>>>>>>> already been overwritten and we don't keep a history of values), but >>>>>>>> shouldn't we rather throw an exception if we were not able to fulfil >>>>>>>> the >>>>>>>> API contract? >>>>>>>> >>>>>>>> Radim >>>>>>>> >>>>>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>>>>> [2] https://issues.jboss.org/browse/ISPN-3422 >>>>>>>> >>>>>>>> -- >>>>>>>> Radim Vansa <rva...@redhat.com> >>>>>>>> JBoss DataGrid QA >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> infinispan-dev mailing list >>>>>>>> infinispan-dev@lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev@lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev@lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>>> -- >>>>> Radim Vansa <rva...@redhat.com> >>>>> JBoss DataGrid QA >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> -- >>>> Radim Vansa >>>> <rva...@redhat.com> >>>> >>>> JBoss DataGrid QA >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> -- >>> Galder Zamarreño >>> gal...@redhat.com >>> twitter.com/galderz >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarreño >> gal...@redhat.com >> twitter.com/galderz >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarreño > gal...@redhat.com > twitter.com/galderz > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev