Re: Update to 1.4.8
Hi Timo, ...So I stopped AAE on all nodes (with riak attach), removed the AAE folders on all the nodes. And then restarted them one-by-one, so they all started with a clean AAE state. Then about a day later the cluster was finally in a normal state. I don't understand the difference between what you did and what I'm describing in the former emails? I've stopped the aae via riak attach, and then one by one, I've stopped the node, removed the anti-entropy data and started the node. Is there any subtle difference I'm not getting? I'm asking this because indeed this hasn't proved to be enough to stop the cluster entire cluster load. Another thing is the anti-entropy dir data size, since the upgrade it has reached very high values comparing to the previous ones... Best regards ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Update to 1.4.8
...So I stopped AAE on all nodes (with riak attach), removed the AAE folders on all the nodes. And then restarted them one-by-one, so they all started with a clean AAE state. Then about a day later the cluster was finally in a normal state. I don't understand the difference between what you did and what I'm describing in the former emails? I've stopped the aae via riak attach, and then one by one, I've stopped the node, removed the anti-entropy data and started the node. Is there any subtle difference I'm not getting? I'm asking this because indeed this hasn't proved to be enough to stop the cluster entire cluster load. Another thing is the anti-entropy dir data size, since the upgrade it has reached very high values comparing to the previous ones… Unfortunately I don’t have a 100% definitive answer for you, maybe someone from Basho can advise. In my case I noticed that after running riak_kv_entropy_manager:disable() the IO load did not decrease immediately and on some servers it took quite a while before iostat showed disk I/O going to normal levels. I only removed the AAE folders after IO load was normal. Now that you have mentioned it I just took a look at my servers and the anti-entropy dir is large (500Mb) on my servers too, although it varies from one server to the next. Best regards, Timo ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Update to 1.4.8
Well, my anti-entropy folders in each machine have ~120G, It's quite a lot!!! I have ~600G of data per server and a cluster of 6 servers with level-db. Just for comparison effects, what about you? Someone of basho, can you please advise on this one? Best regards! :) On 8 April 2014 11:02, Timo Gatsonides t...@me.com wrote: ...So I stopped AAE on all nodes (with riak attach), removed the AAE folders on all the nodes. And then restarted them one-by-one, so they all started with a clean AAE state. Then about a day later the cluster was finally in a normal state. I don't understand the difference between what you did and what I'm describing in the former emails? I've stopped the aae via riak attach, and then one by one, I've stopped the node, removed the anti-entropy data and started the node. Is there any subtle difference I'm not getting? I'm asking this because indeed this hasn't proved to be enough to stop the cluster entire cluster load. Another thing is the anti-entropy dir data size, since the upgrade it has reached very high values comparing to the previous ones... Unfortunately I don't have a 100% definitive answer for you, maybe someone from Basho can advise. In my case I noticed that after running riak_kv_entropy_manager:disable() the IO load did not decrease immediately and on some servers it took quite a while before iostat showed disk I/O going to normal levels. I only removed the AAE folders after IO load was normal. Now that you have mentioned it I just took a look at my servers and the anti-entropy dir is large (500Mb) on my servers too, although it varies from one server to the next. Best regards, Timo ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Update to 1.4.8
I also have 6 servers. Each server has about 2Tb of data. So maybe my anti_entropy dir size is “normal”. Kind regards, Timo p.s. I’m using the multi_backend as I have some data only in memory (riak_kv_memory_backend); all data on disk is in riak_kv_eleveldb_backend. Well, my anti-entropy folders in each machine have ~120G, It's quite a lot!!! I have ~600G of data per server and a cluster of 6 servers with level-db. Just for comparison effects, what about you? Someone of basho, can you please advise on this one? Best regards! :) smime.p7s Description: S/MIME cryptographic signature ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Update to 1.4.8
So basho, to resume: I've upgraded to the latest 1.4.8 version without removing the anti-entropy data dir because at the time that note wasn't already on the Release Notes of 1.4.8. A few days later, I've made it: Stopped the aae via riak attach, restarted all the nodes one by one removing the anti-entropy data in between. The expected results didn't happened.. My level-db data dir has ~600G per server and the anti-antropy data dir has ~120G, this seems to be quite a lot :( The cluster load is still high... Write and read times are inconstant and both high. I have a 6 machine cluster with level-db as backend. The total amount of keys is about 2.5 billions. Best regards! On 8 April 2014 11:21, Timo Gatsonides t...@me.com wrote: I also have 6 servers. Each server has about 2Tb of data. So maybe my anti_entropy dir size is normal. Kind regards, Timo p.s. I'm using the multi_backend as I have some data only in memory (riak_kv_memory_backend); all data on disk is in riak_kv_eleveldb_backend. Well, my anti-entropy folders in each machine have ~120G, It's quite a lot!!! I have ~600G of data per server and a cluster of 6 servers with level-db. Just for comparison effects, what about you? Someone of basho, can you please advise on this one? Best regards! :) ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: RIAK 1.4.6 - Mass key deletion
Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is exactly the same as your empty JSON object. The delete object simply has one more flag set that allows it to also be removed if and only if there is no chance for an identical key to exist on a higher level. I apologize that I cannot give you a more useful answer. 2.0 is on the horizon. Matthew On Apr 6, 2014, at 7:04 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi again! Sorry to reopen this discussion, but I have another question regarding the former post. What if, instead of doing a mass deletion (We've already seen that it will be non profitable, regarding disk space) I update all the values with an empty JSON object {} ? Do you see any problem with this? I no longer need those millions of values that are living in the cluster... When the version 2.0 of riak runs stable I'll do the update and only then delete those keys! Best regards On 18 February 2014 16:32, Edgar Veiga edgarmve...@gmail.com wrote: Ok, thanks a lot Matthew. On 18 February 2014 16:18, Matthew Von-Maszewski matth...@basho.comwrote: Riak 2.0 is coming. Hold your mass delete until then. The bug is within Google's original leveldb architecture. Riak 2.0 sneaks around to get the disk space freed. Matthew On Feb 18, 2014, at 11:10 AM, Edgar Veiga edgarmve...@gmail.com wrote: The only/main purpose is to free disk space.. I was a little bit concerned regarding this operation, but now with your feedback I'm tending to don't do nothing, I can't risk the growing of space... Regarding the overhead I think that with a tight throttling system I could control and avoid overloading the cluster. Mixed feelings :S On 18 February 2014 15:45, Matthew Von-Maszewski matth...@basho.comwrote: Edgar, The first concern I have is that leveldb's delete does not free disk space. Others have executed mass delete operations only to discover they are now using more disk space instead of less. Here is a discussion of the problem: https://github.com/basho/leveldb/wiki/mv-aggressive-delete The link also describes Riak's database operation overhead. This is a second concern. You will need to carefully throttle your delete rate or the overhead will likely impact your production throughput. We have new code to help quicken the actual purge of deleted data in Riak 2.0. But that release is not quite ready for production usage. What do you hope to achieve by the mass delete? Matthew On Feb 18, 2014, at 10:29 AM, Edgar Veiga edgarmve...@gmail.com wrote: Sorry, forgot that info! It's leveldb. Best regards On 18 February 2014 15:27, Matthew Von-Maszewski matth...@basho.comwrote: Which Riak backend are you using: bitcask, leveldb, multi? Matthew On Feb 18, 2014, at 10:17 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi all! I have a fairly trivial question regarding mass deletion on a riak cluster, but firstly let me give you just some context. My cluster is running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 1Tb ssd disks. I need to execute a massive object deletion on a bucket,
Re: RIAK 1.4.6 - Mass key deletion
AAE is broken and brain dead in releases 1.4.3 through 1.4.7. That might be your problem. I have a two billion key data set building now. I will forward node disk usage when available. Matthew Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is exactly the same as your empty JSON object. The delete object simply has one more flag set that allows it to also be removed if and only if there is no chance for an identical key to exist on a higher level. I apologize that I cannot give you a more useful answer. 2.0 is on the horizon. Matthew On Apr 6, 2014, at 7:04 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi again! Sorry to reopen this discussion, but I have another question regarding the former post. What if, instead of doing a mass deletion (We've already seen that it will be non profitable, regarding disk space) I update all the values with an empty JSON object {} ? Do you see any problem with this? I no longer need those millions of values that are living in the cluster... When the version 2.0 of riak runs stable I'll do the update and only then delete those keys! Best regards On 18 February 2014 16:32, Edgar Veiga edgarmve...@gmail.com wrote: Ok, thanks a lot Matthew. On 18 February 2014 16:18, Matthew Von-Maszewski matth...@basho.com wrote: Riak 2.0 is coming. Hold your mass delete until then. The bug is within Google's original leveldb architecture. Riak 2.0 sneaks around to get the disk space freed. Matthew On Feb 18, 2014, at 11:10 AM, Edgar Veiga edgarmve...@gmail.com wrote: The only/main purpose is to free disk space.. I was a little bit concerned regarding this operation, but now with your feedback I'm tending to don't do nothing, I can't risk the growing of space... Regarding the overhead I think that with a tight throttling system I could control and avoid overloading the cluster. Mixed feelings :S On 18 February 2014 15:45, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The first concern I have is that leveldb's delete does not free disk space. Others have executed mass delete operations only to discover they are now using more disk space instead of less. Here is a discussion of the problem: https://github.com/basho/leveldb/wiki/mv-aggressive-delete The link also describes Riak's database operation overhead. This is a second concern. You will need to carefully throttle your delete rate or the overhead will likely impact your production throughput. We have new code to help quicken the actual purge of deleted data in Riak 2.0. But that release is not quite ready for production usage. What do you hope to achieve by the mass delete? Matthew On Feb 18, 2014, at 10:29 AM, Edgar Veiga edgarmve...@gmail.com wrote: Sorry, forgot that info! It's leveldb. Best regards On 18 February 2014 15:27, Matthew Von-Maszewski matth...@basho.com wrote: Which Riak backend are you using: bitcask,
Re: RIAK 1.4.6 - Mass key deletion
Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is exactly the same as your empty JSON object. The delete object simply has one more flag set that allows it to also be removed if and only if there is no chance for an identical key to exist on a higher level. I apologize that I cannot give you a more useful answer. 2.0 is on the horizon. Matthew On Apr 6, 2014, at 7:04 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi again! Sorry to reopen this discussion, but I have another question regarding the former post. What if, instead of doing a mass deletion (We've already seen that it will be non profitable, regarding disk space) I update all the values with an empty JSON object {} ? Do you see any problem with this? I no longer need those millions of values that are living in the cluster... When the version 2.0 of riak runs stable I'll do the update and only then delete those keys! Best regards On 18 February 2014 16:32, Edgar Veiga edgarmve...@gmail.com wrote: Ok, thanks a lot Matthew. On 18 February 2014 16:18, Matthew Von-Maszewski matth...@basho.com wrote: Riak 2.0 is coming. Hold your mass delete until then. The bug is within Google's original leveldb architecture. Riak 2.0 sneaks around to get the disk space freed. Matthew On Feb 18, 2014, at 11:10 AM, Edgar Veiga edgarmve...@gmail.com wrote: The only/main purpose is to free disk space.. I was a little bit concerned regarding this operation, but now with your feedback I'm tending to don't do nothing, I can't risk the growing of space... Regarding the overhead I think that with a tight throttling system I could control and avoid overloading the cluster. Mixed feelings :S On 18 February 2014 15:45, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The first concern I have is that leveldb's delete does not free disk space. Others have executed mass delete operations only to discover they are now using more disk space instead of less. Here is a discussion of the problem: https://github.com/basho/leveldb/wiki/mv-aggressive-delete The link also describes Riak's database operation overhead. This is a second concern. You will need to carefully throttle your delete rate or the overhead will likely impact your production throughput. We have new code to help quicken the actual purge of deleted data in Riak 2.0. But that release is not quite ready for production usage. What do you hope to achieve by the mass delete? Matthew On Feb 18, 2014, at 10:29 AM, Edgar Veiga edgarmve...@gmail.com wrote: Sorry, forgot that info! It's leveldb. Best regards On 18 February 2014 15:27, Matthew Von-Maszewski matth...@basho.com wrote: Which Riak backend are you using: bitcask, leveldb, multi? Matthew On Feb 18, 2014, at 10:17 AM, Edgar Veiga edgarmve...@gmail.com
Re: RIAK 1.4.6 - Mass key deletion
Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is exactly the same as your empty JSON object. The delete object simply has one more flag set that allows it to also be removed if and only if there is no chance for an identical key to exist on a higher level. I apologize that I cannot give you a more useful answer. 2.0 is on the horizon. Matthew On Apr 6, 2014, at 7:04 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi again! Sorry to reopen this discussion, but I have another question regarding the former post. What if, instead of doing a mass deletion (We've already seen that it will be non profitable, regarding disk space) I update all the values with an empty JSON object {} ? Do you see any problem with this? I no longer need those millions of values that are living in the cluster... When the version 2.0 of riak runs stable I'll do the update and only then delete those keys! Best regards On 18 February 2014 16:32, Edgar Veiga edgarmve...@gmail.com wrote: Ok, thanks a lot Matthew. On 18 February 2014 16:18, Matthew Von-Maszewski matth...@basho.comwrote: Riak 2.0 is coming. Hold your mass delete until then. The bug is within Google's original leveldb architecture. Riak 2.0 sneaks around to get the disk space freed. Matthew On Feb 18, 2014, at 11:10 AM, Edgar Veiga edgarmve...@gmail.com wrote: The only/main purpose is to free disk space.. I was a little bit concerned regarding this operation, but now with your feedback I'm tending to don't do nothing, I can't risk the growing of space... Regarding the overhead I think that with a tight throttling system I could control and avoid overloading the cluster. Mixed feelings :S On 18 February 2014 15:45, Matthew Von-Maszewski matth...@basho.comwrote: Edgar, The first concern I have is that leveldb's delete does not free disk space. Others have executed mass delete operations only to discover they are now using more disk space instead of less. Here is a discussion of the problem: https://github.com/basho/leveldb/wiki/mv-aggressive-delete The link also describes Riak's database operation overhead. This is a second concern. You will need to carefully throttle your delete rate or the overhead will likely impact your production throughput. We have
Confirming Riak MDC fullsync replication never deletes.
I'm about to have two data centers bi-directionally replicating and I'm looking to confirm that Riak Multi Data Center fullsync Replication never deletes during replication. The MDC Architecture Page says that the two clusters stream missing objects/updates back and forth. It does not state that any deletions will be remembered and streamed. I'm assuming that fullsync never deletes? And what about real-time? Those also only talk about adding missing objects and updates. Does real-time never replicate a deletion? http://docs.basho.com/riakee/latest/cookbooks/Multi-Data-Center-Replication-Architecture/#Fullsync-Replication Thanks --Ray -- Ray Cote, President Appropriate Solutions, Inc. We Build Software www.AppropriateSolutions.com 603.924.6079 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: RIAK 1.4.6 - Mass key deletion
It makes sense, I do a lot, and I really mean a LOT of updates per key, maybe thousands a day! The cluster is experiencing a lot more updates per each key, than new keys being inserted. The hash trees will rebuild during the next weekend (normally it takes about two days to complete the operation) so I'll come back and give you some feedback (hopefully good) on the next Monday! Again, thanks a lot, You've been very helpful. Edgar On 8 April 2014 15:47, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The test I have running currently has reach 1 Billion keys. It is running against a single node with N=1. It has 42G of AAE data. Here is my extrapolation to compare your numbers: You have ~2.5 Billion keys. I assume you are running N=3 (the default). AAE therefore is actually tracking ~7.5 Billion keys. You have six nodes, therefore tracking ~1.25 Billion keys per node. Raw math would suggest that my 42G of AAE data for 1 billion keys would extrapolate to 52.5G of AAE data for you. Yet you have ~120G of AAE data. Is something wrong? No. My data is still loading and has experience zero key/value updates/edits. AAE hashes get rewritten every time a user updates the value of a key. AAE's leveldb is just like the user leveldb, all prior values of a key accumulate in the .sst table files until compaction removes duplicates. Similarly, a user delete of a key causes a delete tombstone in the AAE hash tree. Those delete tombstones have to await compactions too before leveldb recovers the disk space. AAE's hash trees rebuild weekly. I am told that the rebuild operation will actually destroy the existing files and start over. That is when you should see AAE space usage dropping dramatically. Matthew On Apr 8, 2014, at 9:31 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is exactly the same as your empty JSON object. The delete object simply has one more flag set that allows it to also be removed if and only if there is no chance for an identical key to exist on a higher level. I apologize that I cannot give you a more useful answer. 2.0 is on the horizon. Matthew On Apr 6, 2014, at 7:04 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi again! Sorry
Re: RIAK 1.4.6 - Mass key deletion
Edgar, The test I have running currently has reach 1 Billion keys. It is running against a single node with N=1. It has 42G of AAE data. Here is my extrapolation to compare your numbers: You have ~2.5 Billion keys. I assume you are running N=3 (the default). AAE therefore is actually tracking ~7.5 Billion keys. You have six nodes, therefore tracking ~1.25 Billion keys per node. Raw math would suggest that my 42G of AAE data for 1 billion keys would extrapolate to 52.5G of AAE data for you. Yet you have ~120G of AAE data. Is something wrong? No. My data is still loading and has experience zero key/value updates/edits. AAE hashes get rewritten every time a user updates the value of a key. AAE's leveldb is just like the user leveldb, all prior values of a key accumulate in the .sst table files until compaction removes duplicates. Similarly, a user delete of a key causes a delete tombstone in the AAE hash tree. Those delete tombstones have to await compactions too before leveldb recovers the disk space. AAE's hash trees rebuild weekly. I am told that the rebuild operation will actually destroy the existing files and start over. That is when you should see AAE space usage dropping dramatically. Matthew On Apr 8, 2014, at 9:31 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is exactly the same as your empty JSON object. The delete object simply has one more flag set that allows it to also be removed if and only if there is no chance for an identical key to exist on a higher level. I apologize that I cannot give you a more useful answer. 2.0 is on the horizon. Matthew On Apr 6, 2014, at 7:04 AM, Edgar Veiga edgarmve...@gmail.com wrote: Hi again! Sorry to reopen this discussion, but I have another question regarding the former post. What if, instead of doing a mass deletion (We've already seen that it will be non profitable, regarding disk space) I update all the values with an empty JSON object {} ? Do you see any problem with this? I no longer need those millions of values that are living in the cluster... When the version 2.0 of riak runs stable I'll do the update and only then delete those
Re: Confirming Riak MDC fullsync replication never deletes.
Hi Ray, Deletion is replicated by both real-time and full-sync MDC - specifically, the tombstone representing the deletion will be replicated. The delete_mode setting affects how long a tombstone remains alive for MDC to replicate it (3 seconds by default). If only full-sync MDC is used, the delete_mode should be modified to allow a long enough lifetime for a full-sync operation to replicate the tombstone. http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html -- Luke Bakken CSE lbak...@basho.com On Tue, Apr 8, 2014 at 7:57 AM, Ray Cote rgac...@appropriatesolutions.comwrote: I'm about to have two data centers bi-directionally replicating and I'm looking to confirm that Riak Multi Data Center fullsync Replication never deletes during replication. The MDC Architecture Page says that the two clusters stream missing objects/updates back and forth. It does not state that any deletions will be remembered and streamed. I'm assuming that fullsync never deletes? And what about real-time? Those also only talk about adding missing objects and updates. Does real-time never replicate a deletion? http://docs.basho.com/riakee/latest/cookbooks/Multi-Data-Center-Replication-Architecture/#Fullsync-Replication Thanks --Ray -- Ray Cote, President Appropriate Solutions, Inc. We Build Software www.AppropriateSolutions.com 603.924.6079 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Confirming Riak MDC fullsync replication never deletes.
Hi Luke: Thanks, that's very clear. I'll adjust our configuration accordingly. --Ray - Original Message - From: Luke Bakken lbak...@basho.com To: Ray Cote rgac...@appropriatesolutions.com Cc: riak-users riak-users@lists.basho.com Sent: Tuesday, April 8, 2014 12:20:49 PM Subject: Re: Confirming Riak MDC fullsync replication never deletes. Hi Ray, Deletion is replicated by both real-time and full-sync MDC - specifically, the tombstone representing the deletion will be replicated. The delete_mode setting affects how long a tombstone remains alive for MDC to replicate it (3 seconds by default). If only full-sync MDC is used, the delete_mode should be modified to allow a long enough lifetime for a full-sync operation to replicate the tombstone. http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html -- Luke Bakken CSE lbak...@basho.com On Tue, Apr 8, 2014 at 7:57 AM, Ray Cote rgac...@appropriatesolutions.com wrote: I'm about to have two data centers bi-directionally replicating and I'm looking to confirm that Riak Multi Data Center fullsync Replication never deletes during replication. The MDC Architecture Page says that the two clusters stream missing objects/updates back and forth. It does not state that any deletions will be remembered and streamed. I'm assuming that fullsync never deletes? And what about real-time? Those also only talk about adding missing objects and updates. Does real-time never replicate a deletion? http://docs.basho.com/riakee/latest/cookbooks/Multi-Data-Center-Replication-Architecture/#Fullsync-Replication Thanks --Ray -- Ray Cote, President Appropriate Solutions, Inc. We Build Software www.AppropriateSolutions.com 603.924.6079 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Ray Cote, President Appropriate Solutions, Inc. We Build Software www.AppropriateSolutions.com 603.924.6079 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak CS from Erlang
Lee, That looks like a legitimate permissions issue during the multipart upload initiation. Perhaps check the Riak CS logs for any errors or clues and verify you do have proper ACL permissions for the bucket you are attempting to upload to. We don't use that function in our testing so there could be a bug with it I'm not aware of. I would also try a multipart upload with s3cmd or another similar tool and determine if that that works to narrow the focus. Kelly On April 7, 2014 at 4:45:19 PM, Lee Sylvester (lee.sylves...@gmail.com) wrote: Hi Kelly, I’ve got Riak CS working with erlcloud, so thanks for that. It would have taken a lot longer without your help. However, I wanted to change from using erlcloud:put_object to using erlcloud:put_multipart_object. However, when I make the change, I get the following error: {error, {error, {aws_error, {http_error,403,[], AccessDeniedAccess Denied/tmp-files/test.png}}, [{erlcloud_s3,s3_request,9,[{file,src/erlcloud_s3.erl},{line,899}]}, {erlcloud_s3,s3_xml_request,8, [{file,src/erlcloud_s3.erl},{line,834}]}, {erlcloud_s3,put_multipart_object,6, [{file,src/erlcloud_s3.erl},{line,662}]}, {builder_web_fileupload_resource,process_post,2, [{file,src/builder_web_fileupload_resource.erl},{line,43}]}, {webmachine_resource,resource_call,3, [{file,src/webmachine_resource.erl},{line,186}]}, {webmachine_resource,do,3, [{file,src/webmachine_resource.erl},{line,142}]}, {webmachine_decision_core,resource_call,1, [{file,src/webmachine_decision_core.erl},{line,48}]}, {webmachine_decision_core,decision,1, [{file,src/webmachine_decision_core.erl},{line,486}]}]}} I tried adding an acl item to the Options list, but that didn’t seem to help. Do you have any idea what may be causing this? Thanks, Lee On 1 Apr 2014, at 16:11, Kelly McLaughlin ke...@basho.com wrote: Lee, The erlcloud usage in the riak_test test modules might be easier to follow. Take a look at ./riak_test/tests/object_get_test.erl for a simple example. Also take a look at setup/2 in ./riak_test/src/rtcs.erl to see how the erlcloud configuration record is created. Hope that helps. Kelly On March 31, 2014 at 11:47:52 AM, Lee Sylvester (lee.sylves...@gmail.com) wrote: Hi Kelly, Thank you for the information. I took a look at the client test, but as it uses an FSM, the continuous looping gave me a headache :-) Is there a simpler test that demos this? If not, can you highlight the lines that make the calls to Riak CS etc? That part evades me. Thanks, Lee On 31 Mar 2014, at 16:09, Kelly McLaughlin ke...@basho.com wrote: Lee, We have a fork of erlcloud (https://github.com/basho/erlcloud) we use for testing and it can be made to work with your Riak CS cluster with relatively little pain. Look in the riak_cs repo under client_tests/erlang/ercloud_eqc.erl for some example usage. You'll probably want to set the proxy host and port and perhaps the host name if you're not using the default s3.amazonaws.com. Kelly On March 31, 2014 at 8:44:03 AM, Lee Sylvester (lee.sylves...@gmail.com) wrote: Hi guys, I’m setting up my own Riak CS cluster and wanted to know what the best way to interact with that cluster would be from an Erlang app? I’ve taken a look at ErlCloud, but it seems I’d need to butcher it to get it to work with a custom cluster (not Amazon’s setup). Thanks, Lee ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: RIAK 1.4.6 - Mass key deletion
Edgar, Today we disclosed a new feature for Riak's leveldb, Tiered Storage. The details are here: https://github.com/basho/leveldb/wiki/mv-tiered-options This feature might give you another option in managing your storage volume. Matthew On Apr 8, 2014, at 11:07 AM, Edgar Veiga edgarmve...@gmail.com wrote: It makes sense, I do a lot, and I really mean a LOT of updates per key, maybe thousands a day! The cluster is experiencing a lot more updates per each key, than new keys being inserted. The hash trees will rebuild during the next weekend (normally it takes about two days to complete the operation) so I'll come back and give you some feedback (hopefully good) on the next Monday! Again, thanks a lot, You've been very helpful. Edgar On 8 April 2014 15:47, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The test I have running currently has reach 1 Billion keys. It is running against a single node with N=1. It has 42G of AAE data. Here is my extrapolation to compare your numbers: You have ~2.5 Billion keys. I assume you are running N=3 (the default). AAE therefore is actually tracking ~7.5 Billion keys. You have six nodes, therefore tracking ~1.25 Billion keys per node. Raw math would suggest that my 42G of AAE data for 1 billion keys would extrapolate to 52.5G of AAE data for you. Yet you have ~120G of AAE data. Is something wrong? No. My data is still loading and has experience zero key/value updates/edits. AAE hashes get rewritten every time a user updates the value of a key. AAE's leveldb is just like the user leveldb, all prior values of a key accumulate in the .sst table files until compaction removes duplicates. Similarly, a user delete of a key causes a delete tombstone in the AAE hash tree. Those delete tombstones have to await compactions too before leveldb recovers the disk space. AAE's hash trees rebuild weekly. I am told that the rebuild operation will actually destroy the existing files and start over. That is when you should see AAE space usage dropping dramatically. Matthew On Apr 8, 2014, at 9:31 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part of a compaction involving both new and old. The new and the old key/value pairs must have migrated to adjacent levels through normal compaction operations before leveldb will see them in the same compaction. The migration could take days, weeks, or even months depending upon the size of your entire dataset and the rate of incoming write operations. leveldb's delete object is
Re: RIAK 1.4.6 - Mass key deletion
Thanks Matthew! Today this situation has become unsustainable, In two of the machines I have an anti-entropy dir of 250G... It just keeps growing and growing and I'm almost reaching max size of the disks. Maybe I'll just turn off aae in the cluster, remove all the data in the anti-entropy directory and wait for the v2 of riak. Do you see any problem with this? Best regards! On 8 April 2014 22:11, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, Today we disclosed a new feature for Riak's leveldb, Tiered Storage. The details are here: https://github.com/basho/leveldb/wiki/mv-tiered-options This feature might give you another option in managing your storage volume. Matthew On Apr 8, 2014, at 11:07 AM, Edgar Veiga edgarmve...@gmail.com wrote: It makes sense, I do a lot, and I really mean a LOT of updates per key, maybe thousands a day! The cluster is experiencing a lot more updates per each key, than new keys being inserted. The hash trees will rebuild during the next weekend (normally it takes about two days to complete the operation) so I'll come back and give you some feedback (hopefully good) on the next Monday! Again, thanks a lot, You've been very helpful. Edgar On 8 April 2014 15:47, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The test I have running currently has reach 1 Billion keys. It is running against a single node with N=1. It has 42G of AAE data. Here is my extrapolation to compare your numbers: You have ~2.5 Billion keys. I assume you are running N=3 (the default). AAE therefore is actually tracking ~7.5 Billion keys. You have six nodes, therefore tracking ~1.25 Billion keys per node. Raw math would suggest that my 42G of AAE data for 1 billion keys would extrapolate to 52.5G of AAE data for you. Yet you have ~120G of AAE data. Is something wrong? No. My data is still loading and has experience zero key/value updates/edits. AAE hashes get rewritten every time a user updates the value of a key. AAE's leveldb is just like the user leveldb, all prior values of a key accumulate in the .sst table files until compaction removes duplicates. Similarly, a user delete of a key causes a delete tombstone in the AAE hash tree. Those delete tombstones have to await compactions too before leveldb recovers the disk space. AAE's hash trees rebuild weekly. I am told that the rebuild operation will actually destroy the existing files and start over. That is when you should see AAE space usage dropping dramatically. Matthew On Apr 8, 2014, at 9:31 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The second section of the following (updated) web page discusses the adjustments: https://github.com/basho/leveldb/wiki/Mv-aggressive-delete Matthew On Apr 6, 2014, at 4:29 PM, Edgar Veiga edgarmve...@gmail.com wrote: Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) Best regards On 6 April 2014 15:02, Matthew Von-Maszewski matth...@basho.comwrote: Edgar, In Riak 1.4, there is no advantage to using empty values versus deleting. leveldb is a write once data store. New data for a given key never physically overwrites old data for the same key. New data hides the old data by being in a lower level, and therefore picked first. leveldb's compaction operation will remove older key/value pairs only when the newer key/value is pair is part
Re: RIAK 1.4.6 - Mass key deletion
No. I do not see a problem with your plan. But ... I would prefer to see you add servers to your cluster. Scalabilty is one of Riak's fundamental characteristics. As your database needs grow, we grow with you … just add another server and migrate some of the vnodes there. I obviously cannot speak to your budgetary constraints. All of the engineers at Basho, I am just one, are focused upon providing you performance and features along with your scalability needs. This seems to be a situation where you might be sacrificing data integrity where another server or two would address the situation. And if 2.0 makes things better … sell the extra servers on Ebay. Matthew On Apr 8, 2014, at 6:31 PM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks Matthew! Today this situation has become unsustainable, In two of the machines I have an anti-entropy dir of 250G... It just keeps growing and growing and I'm almost reaching max size of the disks. Maybe I'll just turn off aae in the cluster, remove all the data in the anti-entropy directory and wait for the v2 of riak. Do you see any problem with this? Best regards! On 8 April 2014 22:11, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, Today we disclosed a new feature for Riak's leveldb, Tiered Storage. The details are here: https://github.com/basho/leveldb/wiki/mv-tiered-options This feature might give you another option in managing your storage volume. Matthew On Apr 8, 2014, at 11:07 AM, Edgar Veiga edgarmve...@gmail.com wrote: It makes sense, I do a lot, and I really mean a LOT of updates per key, maybe thousands a day! The cluster is experiencing a lot more updates per each key, than new keys being inserted. The hash trees will rebuild during the next weekend (normally it takes about two days to complete the operation) so I'll come back and give you some feedback (hopefully good) on the next Monday! Again, thanks a lot, You've been very helpful. Edgar On 8 April 2014 15:47, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The test I have running currently has reach 1 Billion keys. It is running against a single node with N=1. It has 42G of AAE data. Here is my extrapolation to compare your numbers: You have ~2.5 Billion keys. I assume you are running N=3 (the default). AAE therefore is actually tracking ~7.5 Billion keys. You have six nodes, therefore tracking ~1.25 Billion keys per node. Raw math would suggest that my 42G of AAE data for 1 billion keys would extrapolate to 52.5G of AAE data for you. Yet you have ~120G of AAE data. Is something wrong? No. My data is still loading and has experience zero key/value updates/edits. AAE hashes get rewritten every time a user updates the value of a key. AAE's leveldb is just like the user leveldb, all prior values of a key accumulate in the .sst table files until compaction removes duplicates. Similarly, a user delete of a key causes a delete tombstone in the AAE hash tree. Those delete tombstones have to await compactions too before leveldb recovers the disk space. AAE's hash trees rebuild weekly. I am told that the rebuild operation will actually destroy the existing files and start over. That is when you should see AAE space usage dropping dramatically. Matthew On Apr 8, 2014, at 9:31 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values... Write now my cluster has 6 machines each one with ~120G of anti-entropy data and 600G of level-db data. This seems to be quite a lot no? My total amount of keys is ~2.5 Billions. Best regards, Edgar On 6 April 2014 23:30, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, This is indirectly related to you key deletion discussion. I made changes recently to the aggressive delete code. The
Re: RIAK 1.4.6 - Mass key deletion
I'll wait a few more days, see if the AAE maybe stabilises and only after that make a decision regarding this. The cluster expanding was on the roadmap, but not right now :) I've attached a few screenshot, you can clearly observe the evolution of one of the machines after the anti-entropy data removal and consequent restart (5th of April). https://cloudup.com/cB0a15lCMeS Best regards! On 8 April 2014 23:44, Matthew Von-Maszewski matth...@basho.com wrote: No. I do not see a problem with your plan. But ... I would prefer to see you add servers to your cluster. Scalabilty is one of Riak's fundamental characteristics. As your database needs grow, we grow with you ... just add another server and migrate some of the vnodes there. I obviously cannot speak to your budgetary constraints. All of the engineers at Basho, I am just one, are focused upon providing you performance and features along with your scalability needs. This seems to be a situation where you might be sacrificing data integrity where another server or two would address the situation. And if 2.0 makes things better ... sell the extra servers on Ebay. Matthew On Apr 8, 2014, at 6:31 PM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks Matthew! Today this situation has become unsustainable, In two of the machines I have an anti-entropy dir of 250G... It just keeps growing and growing and I'm almost reaching max size of the disks. Maybe I'll just turn off aae in the cluster, remove all the data in the anti-entropy directory and wait for the v2 of riak. Do you see any problem with this? Best regards! On 8 April 2014 22:11, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, Today we disclosed a new feature for Riak's leveldb, Tiered Storage. The details are here: https://github.com/basho/leveldb/wiki/mv-tiered-options This feature might give you another option in managing your storage volume. Matthew On Apr 8, 2014, at 11:07 AM, Edgar Veiga edgarmve...@gmail.com wrote: It makes sense, I do a lot, and I really mean a LOT of updates per key, maybe thousands a day! The cluster is experiencing a lot more updates per each key, than new keys being inserted. The hash trees will rebuild during the next weekend (normally it takes about two days to complete the operation) so I'll come back and give you some feedback (hopefully good) on the next Monday! Again, thanks a lot, You've been very helpful. Edgar On 8 April 2014 15:47, Matthew Von-Maszewski matth...@basho.com wrote: Edgar, The test I have running currently has reach 1 Billion keys. It is running against a single node with N=1. It has 42G of AAE data. Here is my extrapolation to compare your numbers: You have ~2.5 Billion keys. I assume you are running N=3 (the default). AAE therefore is actually tracking ~7.5 Billion keys. You have six nodes, therefore tracking ~1.25 Billion keys per node. Raw math would suggest that my 42G of AAE data for 1 billion keys would extrapolate to 52.5G of AAE data for you. Yet you have ~120G of AAE data. Is something wrong? No. My data is still loading and has experience zero key/value updates/edits. AAE hashes get rewritten every time a user updates the value of a key. AAE's leveldb is just like the user leveldb, all prior values of a key accumulate in the .sst table files until compaction removes duplicates. Similarly, a user delete of a key causes a delete tombstone in the AAE hash tree. Those delete tombstones have to await compactions too before leveldb recovers the disk space. AAE's hash trees rebuild weekly. I am told that the rebuild operation will actually destroy the existing files and start over. That is when you should see AAE space usage dropping dramatically. Matthew On Apr 8, 2014, at 9:31 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00.sst. - Each of the 00*.sst files has ~3.7M Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski matth...@basho.com wrote: Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers. Sent from my iPhone On Apr 8, 2014, at 6:51 AM, Edgar Veiga edgarmve...@gmail.com wrote: Thanks again Matthew, you've been very helpful! Maybe you can give me some kind of advise on this issue I'm having since I've upgraded to 1.4.8. Since I've upgraded my anti-entropy data has been growing a lot and has only stabilised in very high values...