Re: RIAK 1.4.6 - Mass key deletion

Matthew Von-Maszewski Tue, 08 Apr 2014 05:24:44 -0700

AAE is broken and brain dead in releases 1.4.3 through 1.4.7.  That might be 
your problem.


I have a two billion key data set building now. I will forward node disk usage 
when available. 

Matthew

Sent from my iPhone

> On Apr 8, 2014, at 6:51 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
> 
> Thanks again Matthew, you've been very helpful!
> 
> Maybe you can give me some kind of advise on this issue I'm having since I've 
> upgraded to 1.4.8.
> 
> Since I've upgraded my anti-entropy data has been growing a lot and has only 
> stabilised in very high values... Write now my cluster has 6 machines each 
> one with ~120G of anti-entropy data and 600G of level-db data. This seems to 
> be quite a lot no? My total amount of keys is ~2.5 Billions.
> 
> Best regards,
> Edgar
> 
>> On 6 April 2014 23:30, Matthew Von-Maszewski <matth...@basho.com> wrote:
>> Edgar,
>> 
>> This is indirectly related to you key deletion discussion.  I made changes 
>> recently to the aggressive delete code.  The second section of the following 
>> (updated) web page discusses the adjustments:
>> 
>>     https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>> 
>> Matthew
>> 
>> 
>>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>> 
>>> Matthew, thanks again for the response!
>>> 
>>> That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :)
>>> 
>>> Best regards
>>> 
>>> 
>>>> On 6 April 2014 15:02, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>> Edgar,
>>>> 
>>>> In Riak 1.4, there is no advantage to using empty values versus deleting.
>>>> 
>>>> leveldb is a "write once" data store.  New data for a given key never 
>>>> physically overwrites old data for the same key.  New data "hides" the old 
>>>> data by being in a lower level, and therefore picked first.
>>>> 
>>>> leveldb's compaction operation will remove older key/value pairs only when 
>>>> the newer key/value is pair is part of a compaction involving both new and 
>>>> old.  The new and the old key/value pairs must have migrated to adjacent 
>>>> levels through normal compaction operations before leveldb will see them 
>>>> in the same compaction.  The migration could take days, weeks, or even 
>>>> months depending upon the size of your entire dataset and the rate of 
>>>> incoming write operations.
>>>> 
>>>> leveldb's "delete" object is exactly the same as your empty JSON object.  
>>>> The delete object simply has one more flag set that allows it to also be 
>>>> removed if and only if there is no chance for an identical key to exist on 
>>>> a higher level.
>>>> 
>>>> I apologize that I cannot give you a more useful answer.  2.0 is on the 
>>>> horizon.
>>>> 
>>>> Matthew
>>>> 
>>>> 
>>>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>> 
>>>>> Hi again!
>>>>> 
>>>>> Sorry to reopen this discussion, but I have another question regarding 
>>>>> the former post.
>>>>> 
>>>>> What if, instead of doing a mass deletion (We've already seen that it 
>>>>> will be non profitable, regarding disk space) I update all the values 
>>>>> with an empty JSON object "{}" ? Do you see any problem with this? I no 
>>>>> longer need those millions of values that are living in the cluster... 
>>>>> 
>>>>> When the version 2.0 of riak runs stable I'll do the update and only then 
>>>>> delete those keys!
>>>>> 
>>>>> Best regards
>>>>> 
>>>>> 
>>>>>> On 18 February 2014 16:32, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>>> Ok, thanks a lot Matthew.
>>>>>> 
>>>>>> 
>>>>>>> On 18 February 2014 16:18, Matthew Von-Maszewski <matth...@basho.com> 
>>>>>>> wrote:
>>>>>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug" is 
>>>>>>> within Google's original leveldb architecture.  Riak 2.0 sneaks around 
>>>>>>> to get the disk space freed.
>>>>>>> 
>>>>>>> Matthew
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <edgarmve...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> The only/main purpose is to free disk space..
>>>>>>>> 
>>>>>>>> I was a little bit concerned regarding this operation, but now with 
>>>>>>>> your feedback I'm tending to don't do nothing, I can't risk the 
>>>>>>>> growing of space... 
>>>>>>>> Regarding the overhead I think that with a tight throttling system I 
>>>>>>>> could control and avoid overloading the cluster.
>>>>>>>> 
>>>>>>>> Mixed feelings :S
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski <matth...@basho.com> 
>>>>>>>>> wrote:
>>>>>>>>> Edgar,
>>>>>>>>> 
>>>>>>>>> The first "concern" I have is that leveldb's delete does not free 
>>>>>>>>> disk space.  Others have executed mass delete operations only to 
>>>>>>>>> discover they are now using more disk space instead of less.  Here is 
>>>>>>>>> a discussion of the problem:
>>>>>>>>> 
>>>>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>>>>>> 
>>>>>>>>> The link also describes Riak's database operation overhead.  This is 
>>>>>>>>> a second "concern".  You will need to carefully throttle your delete 
>>>>>>>>> rate or the overhead will likely impact your production throughput.
>>>>>>>>> 
>>>>>>>>> We have new code to help quicken the actual purge of deleted data in 
>>>>>>>>> Riak 2.0.  But that release is not quite ready for production usage.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What do you hope to achieve by the mass delete?
>>>>>>>>> 
>>>>>>>>> Matthew
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <edgarmve...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Sorry, forgot that info!
>>>>>>>>>> 
>>>>>>>>>> It's leveldb.
>>>>>>>>>> 
>>>>>>>>>> Best regards
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski 
>>>>>>>>>>> <matth...@basho.com> wrote:
>>>>>>>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>>>>>>> 
>>>>>>>>>>> Matthew
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <edgarmve...@gmail.com> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> > Hi all!
>>>>>>>>>>> >
>>>>>>>>>>> > I have a fairly trivial question regarding mass deletion on a 
>>>>>>>>>>> > riak cluster, but firstly let me give you just some context. My 
>>>>>>>>>>> > cluster is running with riak 1.4.6 on 6 machines with a ring of 
>>>>>>>>>>> > 256 nodes and 1Tb ssd disks.
>>>>>>>>>>> >
>>>>>>>>>>> > I need to execute a massive object deletion on a bucket, I'm 
>>>>>>>>>>> > talking of ~1 billion keys (The object average size is ~1Kb). I 
>>>>>>>>>>> > will not retrive the keys from riak because a I have a file with 
>>>>>>>>>>> > all of them. I'll just start a script that reads them from the 
>>>>>>>>>>> > file and triggers an HTTP DELETE for each one.
>>>>>>>>>>> > The cluster will continue running on production with a quite high 
>>>>>>>>>>> > load serving all other applications, while running this deletion.
>>>>>>>>>>> >
>>>>>>>>>>> > My question is simple, do I need to have any kind of extra 
>>>>>>>>>>> > concerns regarding this action? Do you advise me on taking 
>>>>>>>>>>> > special attention to any kind of metrics regarding riak or event 
>>>>>>>>>>> > the servers where it's running?
>>>>>>>>>>> >
>>>>>>>>>>> > Best regards!
>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>> > riak-users mailing list
>>>>>>>>>>> > riak-users@lists.basho.com
>>>>>>>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: RIAK 1.4.6 - Mass key deletion

Reply via email to