Hi Justin,

I think we are coming from two different directions here, leading to
some confusion. You seem to treat a get for a non existing key as an
error, in which case all your points are valid of course. I suspected
that this is the reason for the current design choice, but I didn't see
it stated anywhere explicitly. And also notfound seems to be handled
differently from other types of errors, at least in the way it is
signalled to the client, so I didn't immediately think of it as an error
case.
On the other hand there are many applications where asking for a key
that has never been put is perfectly valid, an not_found is indeed the
right answer in that case. Our application is an example of that. The
key is given (it is a unique cookie ID), and we need to check if we saw
a specific ID before in a certain context and if so get some data that
was associated with the ID back then. More often than not this is not
the case, so notfound is the expected answer.

If you read my original mail again with that use case in mind it might
become clearer what my problem with the current design is.

Having to fulfil the precondition that we only do gets for keys we know
to have been put before would require another datastore for that
purpose, which seems kind of akward and unnecessary, since riak has all
the required data to handle our use case.

Please let me know if I need to further clarify my thoughts about this.
English is not my first language and its hard enough to reason about
these things in German and face-to-face :-).

Cheers,
Nico

Am Montag, den 02.08.2010, 22:29 -0400 schrieb Justin Sheehy: 
> Hi, Nico.
> 
> On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer <[email protected]> wrote:
> 
> > What I mean is, if I do a get request for a key with R=N, and one of the
> > first N nodes in the preflist is down the request will still succeed.
> > Why is that? Doesn't that undermine the purpose of seting R to a high
> > number (specifically setting it to N)? That way a request might succeed
> > even if all primary nodes responsible for the key are unavailable.
> 
> You are correct, and this is intentional.  There is nothing in the R
> or W settings that is intended to indicate anything at all about
> "primary" nodes.  It is rather simply the number of successful
> responses that the client wishes to wait for, and thus the degree of
> quorum sought before a client reply is sent.  Using fallback nodes to
> satisfy reads is a natural result of using fallback nodes to satisfy
> writes.
> 




> If all primary nodes responsible for a key are unavailable, but enough
> of the fallback nodes for that key have received a value for that key
> since they went unavailable (through a fallback write) then a request
> to get that key might succeed.  I am not sure why you see this as a
> bad thing.
> 
> (It will only succeed if R nodes actually provide a successful result,
> not just if they are available.)
> 
> > On a similar note, why is the riak_kv_get_fsm waiting for at least
> > (N/2)+1 responses, if there are only not_found responses, effectively
> > ignoring a smaller R value of the request if the key does not exists?
> 
> This is a compromise to deal with real situations that can occur where
> a single node might be taking a very long time to reply, and a value
> has never been stored for a given key.  Without either this basic
> quorum default for notfounds or alternately considering a notfound as
> success and thus only waiting for R of them, that situation would mean
> that an R=1 request would take much longer to complete than an R=2
> request (due to waiting for the slow node) which is confusing to most
> users.  Note that since it applies to notfounds, this tends to only
> come into play for items that have never been successfully stored with
> at least a basic quorum -- things that really are not present, that
> is.
> 
> > My guess was, that this also has to do with the use of fallback nodes:
> > Since the partition will usually be very small on the fallback/handoff
> > node, it is likely to be the first to answer. So to avoid returning
> > false not_found responses, a basic quorum is required.
> > Am I on the right track here?
> 
> It doesn't have anything to do with fallback nodes explicitly.  It is
> for situations where a node is under any condition that will slow it
> down significantly.  In such situations, there is little to be gained
> in waiting for all N replies if (N/2)+1 have already declared
> notfound.
> 
> > The problem is, this is imposed even for the case that all nodes are up.
> > If one requires very low latency or very high availability (that's why
> > one uses a small R value in the first place) and does a lot of gets for
> > non existent keys, riak silently screws you over by raising R for those
> > keys.
> 
> It seems that there is something here worth clarifying.  If you are
> issuing requests with W+R<=N, and some reads following writes return
> notfound during an interval immediately following initial storage
> time... well, that's what you asked for by not requesting a quorum.
> If you store the object with a sufficiently high W value first, then
> you will not get this sort of notfound response even if your R value
> is only 1.
> 
> I suppose that providing the freedom to do this might be considered
> "screwing you over," but we see it more as allowing you to make
> different choices while still providing safe and unsurprising default
> behavior.  If you try hard enough to screw yourself over, though, Riak
> won't stop you.  If you issue write requests (to any dynamo-model
> system) with some W, followed immediately by a read request with some
> R, and W+R is not greater than N, you should not be expecting the
> write to necessarily be reflected yet.
> 
> > I most likely missed something here, but some ad hoc test I did seem to
> > be consistent with my understanding of the code.
> 
> You have certainly put some real effort into understanding some
> choices made in the Riak code, which I appreciate.  I hope that I have
> helped to extend your understanding of the real operational scenarios
> that have motivated those choices, and how the code will behave in
> those scenarios.
> 
> Best,
> 
> -Justin




_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to