> Let's say I have N=3, R=2. Is the situation in the above paragraph possible > because hypothetically the new node joining the ring could claim 2 of the 3 > vnodes that hold replicas of a certain document? So when I do a read with a > R=2, 2 of the 3 replicas are now in vnodes claimed by new physical node but > it doesn't have the data yet, so my read fails?
Basically yes. A new node joining the cluster causes vnodes to be reassigned, and you could end up with 2+ or more vnodes/replicas owned by a node other than the original owner. To be clear, it's not likely that those two vnodes/replicas were both reassigned to the new joining node. There is logic in Riak that tries to maintain the spread of vnode ownership such that replicas are on separate nodes. While there are some corner cases where that could happen, it's not common and isn't the likely issue here. The more likely case is that one vnode was reassigned to the new node, and another vnode was reassigned to some other existing node in the cluster. The current claim logic in Riak does not only reassign vnodes to newly joining nodes, it may also shuffle around other nodes in the ring in order to maintain certain ring invariants (specifically, the vnodes on different nodes goal previously mentioned). As an aside, the existing claim logic is not as elegant as I'd like, and cases where multiple replicas are reassigned happen a lot more frequently that they should. This is why this not_found issue was encountered by many people -- reassigning 2+ replicas is not a hypothetical case, it happens fairly regularly. While the new clustering approach fixes the not_found issues, it is still "on the list" to improve the claim logic to ensure better replica placement and less reassignment during node addition/removal. There's an outstanding claim logic pull-request from the guys at Dropcam that I'll be looking into this week to see how well it solves this issue. -Joe On Sun, Sep 11, 2011 at 2:21 PM, Jeff Pollard <[email protected]> wrote: > Joe, > Thanks for the explanation it's super helpful. Everything made complete > sense, but I wanted to double check my understanding on one paragraph: >> >> Given how 0.14.2 would immediately reassign partition ownership >> before actually transferring the data, it was entirely possible to have 2 >> or more replicas temporarily in the not_found state even if there was actual >> data for the item somewhere in the cluster (the old partition owner). > > Let's say I have N=3, R=2. Is the situation in the above paragraph possible > because hypothetically the new node joining the ring could claim 2 of the 3 > vnodes that hold replicas of a certain document? So when I do a read with a > R=2, 2 of the 3 replicas are now in vnodes claimed by new physical node but > it doesn't have the data yet, so my read fails? > On Sun, Sep 11, 2011 at 11:25 AM, Joseph Blomstedt <[email protected]> wrote: >> >> Given how 0.14.2 would immediately reassign partition ownership before >> actually transferring the data, it was entirely possible to have 2 or >> more replicas temporarily in the not_found state even if there was >> actual data for the item somewhere in the cluster (the old partition >> owner). > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- Joseph Blomstedt <[email protected]> Software Engineer Basho Technologies, Inc. http://www.basho.com/ _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
