Hi! I have a question regarding the use of fallback nodes for get requests.
As I read the code, get requests use fallback nodes if any of the nodes normally holding the key are not reachable. Also a simple test seemed to confirm this. What I mean is, if I do a get request for a key with R=N, and one of the first N nodes in the preflist is down the request will still succeed. Why is that? Doesn't that undermine the purpose of seting R to a high number (specifically setting it to N)? That way a request might succeed even if all primary nodes responsible for the key are unavailable. I can see that behaviour might make sense if a node is down for a long time, but in that case it should just be removed from the cluster. On a similar note, why is the riak_kv_get_fsm waiting for at least (N/2)+1 responses, if there are only not_found responses, effectively ignoring a smaller R value of the request if the key does not exists? My guess was, that this also has to do with the use of fallback nodes: Since the partition will usually be very small on the fallback/handoff node, it is likely to be the first to answer. So to avoid returning false not_found responses, a basic quorum is required. Am I on the right track here? The problem is, this is imposed even for the case that all nodes are up. If one requires very low latency or very high availability (that's why one uses a small R value in the first place) and does a lot of gets for non existent keys, riak silently screws you over by raising R for those keys. I most likely missed something here, but some ad hoc test I did seem to be consistent with my understanding of the code. Thanks, Nico _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
