Re: Riak Clustering Changes in 1.0

Joseph Blomstedt Sun, 11 Sep 2011 14:06:11 -0700

> Let's say I have N=3, R=2.  Is the situation in the above paragraph possible
> because hypothetically the new node joining the ring could claim 2 of the 3
> vnodes that hold replicas of a certain document?  So when I do a read with a
> R=2, 2 of the 3 replicas are now in vnodes claimed by new physical node but
> it doesn't have the data yet, so my read fails?

Basically yes. A new node joining the cluster causes vnodes to be
reassigned, and you could end up with 2+ or more vnodes/replicas owned
by a node other than the original owner.

To be clear, it's not likely that those two vnodes/replicas were both
reassigned to the new joining node. There is logic in Riak that tries
to maintain the spread of vnode ownership such that replicas are on
separate nodes. While there are some corner cases where that could
happen, it's not common and isn't the likely issue here. The more
likely case is that one vnode was reassigned to the new node, and
another vnode was reassigned to some other existing node in the
cluster. The current claim logic in Riak does not only reassign vnodes
to newly joining nodes, it may also shuffle around other nodes in the
ring in order to maintain certain ring invariants (specifically, the
vnodes on different nodes goal previously mentioned).

As an aside, the existing claim logic is not as elegant as I'd like,
and cases where multiple replicas are reassigned happen a lot more
frequently that they should. This is why this not_found issue was
encountered by many people -- reassigning 2+ replicas is not a
hypothetical case, it happens fairly regularly. While the new
clustering approach fixes the not_found issues, it is still "on the
list" to improve the claim logic to ensure better replica placement
and less reassignment during node addition/removal. There's an
outstanding claim logic pull-request from the guys at Dropcam that
I'll be looking into this week to see how well it solves this issue.

-Joe

On Sun, Sep 11, 2011 at 2:21 PM, Jeff Pollard <[email protected]> wrote:
> Joe,
> Thanks for the explanation it's super helpful.  Everything made complete
> sense, but I wanted to double check my understanding on one paragraph:
>>
>> Given how 0.14.2 would immediately reassign partition ownership
>> before actually transferring the data, it was entirely possible to have 2
>> or more replicas temporarily in the not_found state even if there was actual
>> data for the item somewhere in the cluster (the old partition owner).
>
> Let's say I have N=3, R=2.  Is the situation in the above paragraph possible
> because hypothetically the new node joining the ring could claim 2 of the 3
> vnodes that hold replicas of a certain document?  So when I do a read with a
> R=2, 2 of the 3 replicas are now in vnodes claimed by new physical node but
> it doesn't have the data yet, so my read fails?
> On Sun, Sep 11, 2011 at 11:25 AM, Joseph Blomstedt <[email protected]> wrote:
>>
>> Given how 0.14.2 would immediately reassign partition ownership before
>> actually transferring the data, it was entirely possible to have 2 or
>> more replicas temporarily in the not_found state even if there was
>> actual data for the item somewhere in the cluster (the old partition
>> owner).
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

-- 
Joseph Blomstedt <[email protected]>
Software Engineer
Basho Technologies, Inc.
http://www.basho.com/

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Clustering Changes in 1.0

Reply via email to