On Thu, Jun 16, 2011 at 2:57 AM, Nico Meyer <[email protected]> wrote:
> Hello David, > > this behaviour is quite expected if you think about how Riak works. > Assuming you use the default replication factor of n=3, each key is stored > on all of your three nodes. If you delete a key while one node (let's call > it A) is down, the key is deleted from the two nodes that are still up > (let's call them B and C), and remains on the downed node A. > Once node A is up again, the situation is indistinguishable from B and C > having a hard drive crash and loosing all their data, in that A has the key > and B and C know nothing about it. > I think this is not the behavior anyone wants, regardless of "the way it works". Cassandra does "tombstones", but even those can expire after 10 days (by default), so if it takes you 10 days to get node A back up again, you're going to replicate it across the Cassandra ring as well. Riak doesn't have tombstones (as far as I know) so, you have to make sure all your nodes are up to do a delete. This, to me, seems like a misfeature. > > If you do a GET of the deleted key at this point, the result depends on the > r-value that you choose. For r>1 you will get a not_found on the first get. > For r=1 you might get the data or a not_found, depending on which two nodes > answer first (see https://issues.basho.com/show_bug.cgi?id=992 about basic > quorum for an explanation). Also, at that point read repair will kick in and > re-replicate the key to all nodes, so subsequent GETs will always return the > original datum. > > listing keys on the other hand does not use quorum but just does a set > union of all keys of all the nodes in you cluster. Essentially it is > equivalent to r=1 without basic quorum. The same is true for map/reduce > queries to my knowledge > > The essential problem is that a real physical delete is indistinguishable > from data loss (or never having had the data in the first place), while > those two things are logically different. > Right, there's no tombstone, or equivalent marker for deletion. > If you want to be sure that a key is deleted with all its replicas you must > delete it with a write quorum setting of w=n. Also you need to tell Riak not > to count fallback vnodes toward you write quorum. This feature is quite new > and I believe only available in the head revision. Also I forgot the name of > the parameter and don't know if it is even applicable for DELETEs. > Anyhow, if you do all this, your DELETEs will simply fail if any of the > nodes that has a copy of the key is down (so in your case, if any node is > down). > > This is a feature I didn't even know existed. (not using fallback vnodes) > If you only want to logically delete, and don't care about freeing the disk > space and RAM that is used by the key, you should use a special value, which > is interpreted by your application as a not found. That way you also get > proper conflict resolution between DELETEs and PUTs (say one client deletes > a key while another one updates it). > Yes a tombstone marker could be implemented, possibly more safely, at the application layer. > > Cheers, > Nico > > Am 16.06.2011 00:55, schrieb David Mitchell: > > Erlang: R13B04 > > Riak: 0.14.2 > > > > I have a three node cluster, and while one node was down, I deleted every > key in a certain bucket. Then, I started the node that was down, and it > joined the cluster. > > > > Now, when do a listing on these keys in this bucket, and I get the entire > list. I can also get the values of the bucket. However, when I try to > delete the keys, the keys are not deleted. > > > > Can anyone help me get the nodes back in a consistent state? I have tried > restarting the nodes. > > > > David > > > > > > > > > > > _______________________________________________ > riak-users mailing list > [email protected]http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
