The problem is that the tombstones never disappear - they keep coming back through bucket.get_keys() hours after deletion, even after a restart.
I said I'm using the delete_mode default configuration, because I didn't change it. I now tried, and apparently it's not supported any more in Riak 2.0. 17:16:56.318 [error] You've tried to set delete_mode, but there is no setting with that name.^M 17:16:56.318 [error] Did you mean one of these?^M 17:16:56.335 [error] dtrace^M 17:16:56.335 [error] nodename^M 17:16:56.335 [error] ssl.keyfile^M 17:16:56.335 [error] Error generating configuration in phase transform_datatypes^M 17:16:56.335 [error] Conf file attempted to set unknown variable: delete_mode^M Error generating config with cuttlefish I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single node cluster. Can this be the reason? I guess what I need is a confirmation that something is broken/that I'm doing something stupid. I've tried looking for similar issues (github.com/basho/riak/issues), didn't find any -> I guess that suggests I'm doing something stupid, I just don't know what yet. Thanks again :) -- Paweł On 19 May 2014 18:00, Dmitri Zagidulin <dzagidu...@basho.com> wrote: > Ah, yes, you bring up a good point. (And, that's another subtlety to keep > in mind, with Option #1). > > Tombstones are definitely something to keep in mind, when deleting unit > test data. > As you mentioned in your earlier question, if you're using default > delete_mode configuration ( 3 seconds ), it means that if you issue a > delete, a tombstone object is going to be written (and stick around for at > least 3 seconds), and unfortunately, it is going to show up as a false > positive on a List Keys call. > > The easiest thing to try, in your case, is to set 'delete_mode' to > 'immediate', restart the test cluster, and retest. With an immediate > delete, your second test with 10 keys should not take as long as the > previous delete with 10000 keys. > > > > > On Mon, May 19, 2014 at 11:46 AM, Paweł Królikowski <rabb...@gmail.com>wrote: > >> Hi Dmitri, >> >> Thanks a lot for the answer. Option #1 seems the best, but I have a >> follow up question: >> >> - when do the deleted keys disappear from Riak: a part of my problem >> (have not explained it correctly the first time), is that get_keys() >> returns keys that no longer exist. So, I run a test with 10 000 keys, I >> remove them, it takes Nseconds. I then follow with a test with 10 keys, but >> removing them takes just as much time - I imagine it's because I'm going >> over that 10 000 keys again. >> >> This article seems relevant: >> http://basho.com/riaks-config-behaviors-part-3/ - it seems like the >> tombstones simply remain in my system indefinitely. >> >> -- >> Paweł >> >> >> On 19 May 2014 15:32, Dmitri Zagidulin <dzagidu...@basho.com> wrote: >> >>> Hi Pawel, >>> >>> There's basically three ways to clear data from Riak (for the purposes >>> of automated testing): >>> >>> 1. Iterate through the keys via get_keys(), and delete each one. This is >>> what you're currently doing, except you don't need to invoke if.exists(). >>> if.exists() makes an additional API call to Riak, and it takes twice as >>> long as just calling delete() (and trapping a potential 404 doesn't exist >>> error). >>> >>> Advantages: Easy to understand, can be done entirely in code (without >>> invoking OS/shell commands). >>> >>> Disadvantages: It can get slow, for large data sets. Another subtle >>> disadvantage is that, as your app grows, it can get difficult to keep track >>> of which buckets you've created and need to be cleared. >>> >>> 2. Stop the Riak cluster, delete the riak data directory, and re-start. >>> >>> Advantages: Very fast, and you can be sure that you're deleting all >>> buckets. >>> >>> Disadvantages: Involves invoking OS/shell commands. This is fairly easy >>> if your Riak node is running on the same machine as your tests (and if it's >>> a single node). To delete the data directories of a multi-node cluster, now >>> you need to involve either a bash script that uses SSH to log in and >>> restart, or a coordination framework like Ansible. >>> >>> 3. Use an in-memory back end. (And to drop all data, just restart the >>> node(s)). >>> >>> Advantages: Same as #2 - fast, thorough. >>> >>> Disadvantages: Same as #2 (involves shell commands, potentially SSH >>> etc). In addition, since you're likely not going to be running your >>> production code on an in-memory back end, this method introduces a >>> potential environmental/functional difference between your testing and >>> production clusters. >>> >>> I generally use method #1 in my unit tests, and manually delete each >>> key. >>> >>> Dmitri >>> >>> >>> >>> On Mon, May 19, 2014 at 8:53 AM, Paweł Królikowski <rabb...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> For testing, I'd like to be able to throw a large number of data at >>>> Riak (100k+ entries), check how it performed, change something in the >>>> application, run the test again. I'd like to use the same data every time, >>>> so, I'd like to clear the bucket between every test. >>>> >>>> The documentation ( >>>> http://docs.basho.com/riak/2.0.0beta1/dev/references/http/) says: >>>> >>>> *Delete Buckets* >>>> There is no straightforward way to delete an entire Bucket. To delete >>>> all the keys in a bucket, you’ll need to delete them all individually. >>>> >>>> >>>> So, I'm currently using something like: >>>> >>>> for k in r_bk.get_keys(): >>>> v = r_bk.get(k) >>>> if v.exists: >>>> r_bk.delete(v) >>>> >>>> The problem is that r_bk.get_keys() returns a lot of elements that >>>> don't exist (tombstones?) and iterating over all of them takes time. >>>> >>>> Is that the way it's supposed to work? Or am I missing something? >>>> >>>> - I'm using default delete_mode configuration ( 3 seconds ) >>>> - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with strong >>>> consistency in Beta1, cannot use it) >>>> - changing the bucket name for every run seems .. impractical? >>>> >>>> Any advices welcomed, >>>> >>>> -- >>>> Thanks, >>>> Paweł >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com