On 25 January 2012 17:09, Dan Berindei <dan.berin...@gmail.com> wrote: > On Wed, Jan 25, 2012 at 4:22 PM, Mircea Markus <mircea.mar...@jboss.com> > wrote: >> >> One node might be busy doing GC and stay unresponsive for a whole >> >> second or longer, another one might be actually crashed and you didn't >> >> know that yet, these are unlikely but possible. >> >> All these are possible but I would rather consider them as exceptional >> situations, possibly handled by a retry logic. We should *not* optimise for >> that these situations IMO. >> > > As Sanne pointed out, an exceptional situation on a node becomes > ordinary with 100s or 1000s of nodes. > So the default policy should scale the initial number of requests with > numOwners. > >> >> More likely, a rehash is in progress, you could then be asking a node >> >> which doesn't yet (or anymore) have the value. >> >> >> this is a consistency issue and I think we can find a way to handle it some >> other way. >> > > With the current state transfer we always send ClusteredGetCommands to > the old owners (and only the old owners). If a node didn't receive the > entire state, it means that state transfer hasn't finished yet and the > CH will not return it as an owner. But the CH could also return owners > that are no longer members of the cluster, so we have to check for > that before picking one owner to send the command to. > > In Sanne's non-blocking state transfer proposal I think a new owner > may have to ask the old owner for the key value, so it would still > never return null. But it might be less expensive to ask the old owner > directly (assuming it's safe from a consistency POV). > >> >> All good reasons for which imho it makes sense to send out "a couple" >> >> of requests in parallel, but I'd unlikely want to send more than 2, >> >> and I agree often 1 might be enough. >> >> Maybe it should even optimize for the most common case: send out just >> >> one, have a more aggressive timeout and in case of trouble ask for the >> >> next node. >> >> +1 >> > > -1 for aggressive timeouts... you're going to do the same work as you > do now, except you're going to wait a bit between sending requests. If > you're really unlucky the first target will return first but you'll > ignore its response because the timeout already expired.
Agreed, what I meant with "more aggressive timeouts" is not the overall timeout to fail the get, but we might have a second one which is more aggressive by starting to send the next GET when the first one is "starting to not look good"; so we would have a timeout for the whole operation, and one which decides at which point after a single GET RPC didn't return yet we start to ask to another node. So even if the global timeout is something high like "10 seconds", if after 40 ms I still didn't get a reply from the first node I think we can start sending the next one.. but still wait to eventually get an answer on the first. > >> >> In addition, sending a single request might spare us some Future, >> >> await+notify messing in terms of CPU cost of sending the request. >> >> it's the remote OOB thread that's the most costly resource imo. >> > > I don't think the OOB thread is that costly, it doesn't block on > anything (not even on state transfer!) so the most expensive part is > reading the key and writing the value. BTW Sanne, we may want to run > Transactional with a smaller payload size ;) > > We could implement our own GroupRequest that sends the requests in > parallel instead implementing FutureCollator on top of UnicastRequest > and save some of that overhead on the caller. > > I think we already have a JIRA to make PutKeyValueCommands return the > previous value, that would eliminate lots of GetKeyValueCommands and > it would actually improve the performance of puts - we should probably > make this a priority. +1 !! > >> >> I think I agree on all points, it makes more sense. >> Just that in a large cluster, let's say >> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write >> performance tradeoff, and with such high numbers I guess doing 2-3 >> gets in parallel might make sense as those "unlikely" events, suddenly >> are an almost certain.. especially the rehash in progress. >> >> So I'd propose a separate configuration option for # parallel get >> events, and one to define a "try next node" policy. Or this policy >> should be the whole strategy, and the #gets one of the options for the >> default implementation. >> >> Agreed that having a configurable remote get policy makes sense. >> We already have a JIRA for this[1], I'll start working on it as the >> performance results are hunting me. > > I'd rather focus on implementing one remote get policy that works > instead of making it configurable - even if we make it configurable > we'll have to focus our optimizations on the default policy. > > Keep in mind that we also want to introduce eventual consistency - I > think that's going to eliminate our optimization opportunity here > because we'll need to get the values from a majority of owners (if not > all the owners). > >> I'd like to have Dan's input on this as well first, as he has worked with >> remote gets and I still don't know why null results are not considered valid >> :) > > Pre-5.0 during state transfer an owner could return null to mean "I'm > not sure", so the caller would ignore it unless every target returned > null. > That's no longer necessary, but it wasn't broken so I didn't fix it... > > Cheers > Dan > >> >> [1] https://issues.jboss.org/browse/ISPN-825 >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev