On 9/28/11 1:48 PM, Dan Berindei wrote: > On Wed, Sep 28, 2011 at 12:59 PM, Bela Ban<b...@redhat.com> wrote: >> My 5 cents:
>> - Are you clubbing (virtual) view updates and rebalancing together ? And >> if so (I should probably read on first...), can't you have view >> installations *without* rebalancing ? >> > > I'm not sure what benefits you would get from joining the cache view > but not receiving state compared to not sending the join request at > all. Non-members are allowed to send/receive commands, so in the > future we could even have separate "server" nodes (that join the cache > view) and "client" nodes (joining the JGroups cluster to send > commands, but not the cache view and so not holding any data except > L1). I had the scenario in mind where you join 100 members and only *then* do a state transfer (rebalancing). > My idea was that the cache view was a representation of the caches > that are able to service requests, so it doesn't make sense to include > in the view caches that don't hold data. OK. So with periodic rebalancing, you'd hold (virtual) views and state *until* the trigger fires, which then installs the new virtual views and rebalances the state ? In this case, tying view delivery and rebalancing together makes sense... >> - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW / COMMIT_VIEW 2PC >> handling ? This adds a lot of complexity. Is it only used when we have a >> transactional cache ? >> > > Nope, this doesn't have anything to do with transactional caches, > instead it is all about computing the owner that will push the key > during the rebalance operation. > > In order to do it deterministically we need to have a common "last > good consistent hash" for the last rebalance that finished > successfully, and each node must determine if it should push a key or > not based on that last good CH. OK. I just hope this makes sense for large clusters, as it is a 2PC, which doesn't scale to a larger number of nodes. I mean, we don't use FLUSH in large clusters for the same reason. Hmm, on the upside, you don't run this algorithm a lot though, so maybe running it only a few times amortizes the cost of it. With this algorithm, I assume you won't need the transitory view anymore (UnionConsistentHashFunction or whatever it was called), which includes both current and new owners of a key ? > A rebalance operation can also fail for various reasons (e.g. the > coordinator died). If that happens the new owners won't have all the > state, so they should not receive requests for the state that they > would have had in the pending CH. OK, fair enough >> - State is to be transferred *within* this 2PC time frame. Hmm, again, >> this ties rebalancing and view installation together (see my argument >> above)... >> > > If view installation wasn't tied to state transfer then we'd have to > keep yet the last rebalanced view somewhere else. We would hold the > "last pending view" (pending rebalance, that is) in the > CacheViewsManager and the "last rebalanced view" in another component, > and that component would have it's own mechanism for synchronizing the > "last rebalanced view" among cache members. So I think the 2PC > approach in CacheViewsManager actually simplifies things. OK, agreed. I would not like this if it was run on every view installation, but since we're running it after a cooldown period, or after having received N JOIN requests or M LEAVE requests, I guess it should be fine. +1 on simplification -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev