Hi, Nico.

On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer <nico.me...@adition.com> wrote:

> What I mean is, if I do a get request for a key with R=N, and one of the
> first N nodes in the preflist is down the request will still succeed.
> Why is that? Doesn't that undermine the purpose of seting R to a high
> number (specifically setting it to N)? That way a request might succeed
> even if all primary nodes responsible for the key are unavailable.

You are correct, and this is intentional.  There is nothing in the R
or W settings that is intended to indicate anything at all about
"primary" nodes.  It is rather simply the number of successful
responses that the client wishes to wait for, and thus the degree of
quorum sought before a client reply is sent.  Using fallback nodes to
satisfy reads is a natural result of using fallback nodes to satisfy
writes.

If all primary nodes responsible for a key are unavailable, but enough
of the fallback nodes for that key have received a value for that key
since they went unavailable (through a fallback write) then a request
to get that key might succeed.  I am not sure why you see this as a
bad thing.

(It will only succeed if R nodes actually provide a successful result,
not just if they are available.)

> On a similar note, why is the riak_kv_get_fsm waiting for at least
> (N/2)+1 responses, if there are only not_found responses, effectively
> ignoring a smaller R value of the request if the key does not exists?

This is a compromise to deal with real situations that can occur where
a single node might be taking a very long time to reply, and a value
has never been stored for a given key.  Without either this basic
quorum default for notfounds or alternately considering a notfound as
success and thus only waiting for R of them, that situation would mean
that an R=1 request would take much longer to complete than an R=2
request (due to waiting for the slow node) which is confusing to most
users.  Note that since it applies to notfounds, this tends to only
come into play for items that have never been successfully stored with
at least a basic quorum -- things that really are not present, that
is.

> My guess was, that this also has to do with the use of fallback nodes:
> Since the partition will usually be very small on the fallback/handoff
> node, it is likely to be the first to answer. So to avoid returning
> false not_found responses, a basic quorum is required.
> Am I on the right track here?

It doesn't have anything to do with fallback nodes explicitly.  It is
for situations where a node is under any condition that will slow it
down significantly.  In such situations, there is little to be gained
in waiting for all N replies if (N/2)+1 have already declared
notfound.

> The problem is, this is imposed even for the case that all nodes are up.
> If one requires very low latency or very high availability (that's why
> one uses a small R value in the first place) and does a lot of gets for
> non existent keys, riak silently screws you over by raising R for those
> keys.

It seems that there is something here worth clarifying.  If you are
issuing requests with W+R<=N, and some reads following writes return
notfound during an interval immediately following initial storage
time... well, that's what you asked for by not requesting a quorum.
If you store the object with a sufficiently high W value first, then
you will not get this sort of notfound response even if your R value
is only 1.

I suppose that providing the freedom to do this might be considered
"screwing you over," but we see it more as allowing you to make
different choices while still providing safe and unsurprising default
behavior.  If you try hard enough to screw yourself over, though, Riak
won't stop you.  If you issue write requests (to any dynamo-model
system) with some W, followed immediately by a read request with some
R, and W+R is not greater than N, you should not be expecting the
write to necessarily be reflected yet.

> I most likely missed something here, but some ad hoc test I did seem to
> be consistent with my understanding of the code.

You have certainly put some real effort into understanding some
choices made in the Riak code, which I appreciate.  I hope that I have
helped to extend your understanding of the real operational scenarios
that have motivated those choices, and how the code will behave in
those scenarios.

Best,

-Justin

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to