Re: RIAK 1.4.6 - Mass key deletion

Edgar Veiga Tue, 08 Apr 2014 08:11:22 -0700

It makes sense, I do a lot, and I really mean a LOT of updates per key,
maybe thousands a day! The cluster is experiencing a lot more updates per
each key, than new keys being inserted.


The hash trees will rebuild during the next weekend (normally it takes
about two days to complete the operation) so I'll come back and give you
some feedback (hopefully good) on the next Monday!

Again, thanks a lot, You've been very helpful.
Edgar


On 8 April 2014 15:47, Matthew Von-Maszewski <matth...@basho.com> wrote:

> Edgar,
>
> The test I have running currently has reach 1 Billion keys.  It is running
> against a single node with N=1.  It has 42G of AAE data.  Here is my
> extrapolation to compare your numbers:
>
> You have ~2.5 Billion keys.  I assume you are running N=3 (the default).
>  AAE therefore is actually tracking ~7.5 Billion keys.  You have six nodes,
> therefore tracking ~1.25 Billion keys per node.
>
> Raw math would suggest that my 42G of AAE data for 1 billion keys would
> extrapolate to 52.5G of AAE data for you.  Yet you have ~120G of AAE data.
>  Is something wrong?  No.  My data is still loading and has experience zero
> key/value updates/edits.
>
> AAE hashes get rewritten every time a user updates the value of a key.
>  AAE's leveldb is just like the user leveldb, all prior values of a key
> accumulate in the .sst table files until compaction removes duplicates.
>  Similarly, a user delete of a key causes a delete tombstone in the AAE
> hash tree.  Those delete tombstones have to await compactions too before
> leveldb recovers the disk space.
>
> AAE's hash trees rebuild weekly.  I am told that the rebuild operation
> will actually destroy the existing files and start over.  That is when you
> should see AAE space usage dropping dramatically.
>
> Matthew
>
>
> On Apr 8, 2014, at 9:31 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>
> Thanks a lot Matthew!
>
> A little bit of more info, I've gathered a sample of the contents of
> anti-entropy data of one of my machines:
> - 44 folders with the name equal to the name of the folders in level-db
> dir (i.e. 393920363186844927172086927568060657641638068224/)
> - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders.
> - The biggest sst folder is sst_3 with 4.3G
> - Inside sst_3 folder there are 1219 files name 00****.sst.
> - Each of the 00*****.sst files has ~3.7M
>
> Hope this info gives you some more help!
>
> Best regards, and again, thanks a lot
> Edgar
>
>
> On 8 April 2014 13:24, Matthew Von-Maszewski <matth...@basho.com> wrote:
>
>> Argh. Missed where you said you had upgraded. Ok it will proceed with
>> getting you comparison numbers.
>>
>> Sent from my iPhone
>>
>> On Apr 8, 2014, at 6:51 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>
>> Thanks again Matthew, you've been very helpful!
>>
>> Maybe you can give me some kind of advise on this issue I'm having since
>> I've upgraded to 1.4.8.
>>
>> Since I've upgraded my anti-entropy data has been growing a lot and has
>> only stabilised in very high values... Write now my cluster has 6 machines
>> each one with ~120G of anti-entropy data and 600G of level-db data. This
>> seems to be quite a lot no? My total amount of keys is ~2.5 Billions.
>>
>> Best regards,
>> Edgar
>>
>> On 6 April 2014 23:30, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>
>>> Edgar,
>>>
>>> This is indirectly related to you key deletion discussion.  I made
>>> changes recently to the aggressive delete code.  The second section of the
>>> following (updated) web page discusses the adjustments:
>>>
>>>     https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>>>
>>> Matthew
>>>
>>>
>>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>
>>> Matthew, thanks again for the response!
>>>
>>> That said, I'll wait again for the 2.0 (and maybe buy some bigger disks
>>> :)
>>>
>>> Best regards
>>>
>>>
>>> On 6 April 2014 15:02, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>
>>>> Edgar,
>>>>
>>>> In Riak 1.4, there is no advantage to using empty values versus
>>>> deleting.
>>>>
>>>> leveldb is a "write once" data store.  New data for a given key never
>>>> physically overwrites old data for the same key.  New data "hides" the old
>>>> data by being in a lower level, and therefore picked first.
>>>>
>>>> leveldb's compaction operation will remove older key/value pairs only
>>>> when the newer key/value is pair is part of a compaction involving both new
>>>> and old.  The new and the old key/value pairs must have migrated to
>>>> adjacent levels through normal compaction operations before leveldb will
>>>> see them in the same compaction.  The migration could take days, weeks, or
>>>> even months depending upon the size of your entire dataset and the rate of
>>>> incoming write operations.
>>>>
>>>> leveldb's "delete" object is exactly the same as your empty JSON
>>>> object.  The delete object simply has one more flag set that allows it to
>>>> also be removed if and only if there is no chance for an identical key to
>>>> exist on a higher level.
>>>>
>>>> I apologize that I cannot give you a more useful answer.  2.0 is on the
>>>> horizon.
>>>>
>>>> Matthew
>>>>
>>>>
>>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>
>>>> Hi again!
>>>>
>>>> Sorry to reopen this discussion, but I have another question regarding
>>>> the former post.
>>>>
>>>> What if, instead of doing a mass deletion (We've already seen that it
>>>> will be non profitable, regarding disk space) I update all the values with
>>>> an empty JSON object "{}" ? Do you see any problem with this? I no longer
>>>> need those millions of values that are living in the cluster...
>>>>
>>>> When the version 2.0 of riak runs stable I'll do the update and only
>>>> then delete those keys!
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 18 February 2014 16:32, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>
>>>>> Ok, thanks a lot Matthew.
>>>>>
>>>>>
>>>>> On 18 February 2014 16:18, Matthew Von-Maszewski 
>>>>> <matth...@basho.com>wrote:
>>>>>
>>>>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug" is
>>>>>> within Google's original leveldb architecture.  Riak 2.0 sneaks around to
>>>>>> get the disk space freed.
>>>>>>
>>>>>> Matthew
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> The only/main purpose is to free disk space..
>>>>>>
>>>>>> I was a little bit concerned regarding this operation, but now with
>>>>>> your feedback I'm tending to don't do nothing, I can't risk the growing 
>>>>>> of
>>>>>> space...
>>>>>> Regarding the overhead I think that with a tight throttling system I
>>>>>> could control and avoid overloading the cluster.
>>>>>>
>>>>>> Mixed feelings :S
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski 
>>>>>> <matth...@basho.com>wrote:
>>>>>>
>>>>>>> Edgar,
>>>>>>>
>>>>>>> The first "concern" I have is that leveldb's delete does not free
>>>>>>> disk space.  Others have executed mass delete operations only to 
>>>>>>> discover
>>>>>>> they are now using more disk space instead of less.  Here is a 
>>>>>>> discussion
>>>>>>> of the problem:
>>>>>>>
>>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>>>>
>>>>>>> The link also describes Riak's database operation overhead.  This is
>>>>>>> a second "concern".  You will need to carefully throttle your delete 
>>>>>>> rate
>>>>>>> or the overhead will likely impact your production throughput.
>>>>>>>
>>>>>>> We have new code to help quicken the actual purge of deleted data in
>>>>>>> Riak 2.0.  But that release is not quite ready for production usage.
>>>>>>>
>>>>>>>
>>>>>>> What do you hope to achieve by the mass delete?
>>>>>>>
>>>>>>> Matthew
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Sorry, forgot that info!
>>>>>>>
>>>>>>> It's leveldb.
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski <matth...@basho.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>>>>
>>>>>>>> Matthew
>>>>>>>>
>>>>>>>>
>>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Hi all!
>>>>>>>> >
>>>>>>>> > I have a fairly trivial question regarding mass deletion on a
>>>>>>>> riak cluster, but firstly let me give you just some context. My 
>>>>>>>> cluster is
>>>>>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 1Tb 
>>>>>>>> ssd
>>>>>>>> disks.
>>>>>>>> >
>>>>>>>> > I need to execute a massive object deletion on a bucket, I'm
>>>>>>>> talking of ~1 billion keys (The object average size is ~1Kb). I will 
>>>>>>>> not
>>>>>>>> retrive the keys from riak because a I have a file with all of them. 
>>>>>>>> I'll
>>>>>>>> just start a script that reads them from the file and triggers an HTTP
>>>>>>>> DELETE for each one.
>>>>>>>> > The cluster will continue running on production with a quite high
>>>>>>>> load serving all other applications, while running this deletion.
>>>>>>>> >
>>>>>>>> > My question is simple, do I need to have any kind of extra
>>>>>>>> concerns regarding this action? Do you advise me on taking special
>>>>>>>> attention to any kind of metrics regarding riak or event the servers 
>>>>>>>> where
>>>>>>>> it's running?
>>>>>>>> >
>>>>>>>> > Best regards!
>>>>>>>> > _______________________________________________
>>>>>>>> > riak-users mailing list
>>>>>>>> > riak-users@lists.basho.com
>>>>>>>> >
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: RIAK 1.4.6 - Mass key deletion

Reply via email to