Hi, Nico. On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer <nico.me...@adition.com> wrote:
> What I mean is, if I do a get request for a key with R=N, and one of the > first N nodes in the preflist is down the request will still succeed. > Why is that? Doesn't that undermine the purpose of seting R to a high > number (specifically setting it to N)? That way a request might succeed > even if all primary nodes responsible for the key are unavailable. You are correct, and this is intentional. There is nothing in the R or W settings that is intended to indicate anything at all about "primary" nodes. It is rather simply the number of successful responses that the client wishes to wait for, and thus the degree of quorum sought before a client reply is sent. Using fallback nodes to satisfy reads is a natural result of using fallback nodes to satisfy writes. If all primary nodes responsible for a key are unavailable, but enough of the fallback nodes for that key have received a value for that key since they went unavailable (through a fallback write) then a request to get that key might succeed. I am not sure why you see this as a bad thing. (It will only succeed if R nodes actually provide a successful result, not just if they are available.) > On a similar note, why is the riak_kv_get_fsm waiting for at least > (N/2)+1 responses, if there are only not_found responses, effectively > ignoring a smaller R value of the request if the key does not exists? This is a compromise to deal with real situations that can occur where a single node might be taking a very long time to reply, and a value has never been stored for a given key. Without either this basic quorum default for notfounds or alternately considering a notfound as success and thus only waiting for R of them, that situation would mean that an R=1 request would take much longer to complete than an R=2 request (due to waiting for the slow node) which is confusing to most users. Note that since it applies to notfounds, this tends to only come into play for items that have never been successfully stored with at least a basic quorum -- things that really are not present, that is. > My guess was, that this also has to do with the use of fallback nodes: > Since the partition will usually be very small on the fallback/handoff > node, it is likely to be the first to answer. So to avoid returning > false not_found responses, a basic quorum is required. > Am I on the right track here? It doesn't have anything to do with fallback nodes explicitly. It is for situations where a node is under any condition that will slow it down significantly. In such situations, there is little to be gained in waiting for all N replies if (N/2)+1 have already declared notfound. > The problem is, this is imposed even for the case that all nodes are up. > If one requires very low latency or very high availability (that's why > one uses a small R value in the first place) and does a lot of gets for > non existent keys, riak silently screws you over by raising R for those > keys. It seems that there is something here worth clarifying. If you are issuing requests with W+R<=N, and some reads following writes return notfound during an interval immediately following initial storage time... well, that's what you asked for by not requesting a quorum. If you store the object with a sufficiently high W value first, then you will not get this sort of notfound response even if your R value is only 1. I suppose that providing the freedom to do this might be considered "screwing you over," but we see it more as allowing you to make different choices while still providing safe and unsurprising default behavior. If you try hard enough to screw yourself over, though, Riak won't stop you. If you issue write requests (to any dynamo-model system) with some W, followed immediately by a read request with some R, and W+R is not greater than N, you should not be expecting the write to necessarily be reflected yet. > I most likely missed something here, but some ad hoc test I did seem to > be consistent with my understanding of the code. You have certainly put some real effort into understanding some choices made in the Riak code, which I appreciate. I hope that I have helped to extend your understanding of the real operational scenarios that have motivated those choices, and how the code will behave in those scenarios. Best, -Justin _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com