Re: leveled compaction and tombstoned data
On Sat, Nov 10, 2012 at 7:17 PM, Edward Capriolo edlinuxg...@gmail.comwrote: No it does not exist. Rob and I might start a donation page and give the money to whoever is willing to code it. If someone would write a tool that would split an sstable into 4 smaller sstables (even an offline command line tool) Something like that: https://github.com/pcmanus/cassandra/commits/sstable_split (adds an sstablesplit offline tool) I would paypal them a hundo. Just tell me how you want to proceed :) -- Sylvain On Sat, Nov 10, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote: Nope. I think at least once a week I hear someone suggest one way to solve their problem is to write an sstablesplit tool. I'm pretty sure that: Step 1. Write sstablesplit Step 2. ??? Step 3. Profit! On Sat, Nov 10, 2012 at 9:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
I would be careful with the patch that was referred to above, it hasn't been reviewed, and from a glance it appears that it will cause an infinite compaction loop if you get more than 4 SSTables at max size. it will, you need to setup max sstable size correctly.
Re: leveled compaction and tombstoned data
For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: leveled compaction and tombstoned data
@Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: leveled compaction and tombstoned data
Nope. I think at least once a week I hear someone suggest one way to solve their problem is to write an sstablesplit tool. I'm pretty sure that: Step 1. Write sstablesplit Step 2. ??? Step 3. Profit! On Sat, Nov 10, 2012 at 9:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
No it does not exist. Rob and I might start a donation page and give the money to whoever is willing to code it. If someone would write a tool that would split an sstable into 4 smaller sstables (even an offline command line tool) I would paypal them a hundo. On Sat, Nov 10, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote: Nope. I think at least once a week I hear someone suggest one way to solve their problem is to write an sstablesplit tool. I'm pretty sure that: Step 1. Write sstablesplit Step 2. ??? Step 3. Profit! On Sat, Nov 10, 2012 at 9:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob Coli Does the sstablesplit function exists somewhere ? 2012/11/10 Jim Cistaro jcist...@netflix.com For some of our clusters, we have taken the periodic major compaction route. There are a few things to consider: 1) Once you start major compacting, depending on data size, you may be committed to doing it periodically because you create one big file that will take forever to naturally compact agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote: we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into sstables. They enter level0, and get slowly, over time, promoted upwards to levelN. Depending on your *total* mutation volume VS your data set size, this may be quite a slow process. This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone. If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction. This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along. If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night).
Re: leveled compaction and tombstoned data
The rules for tombstone eviction are as follows (regardless of your compaction strategy): 1. gc_grace must be expired, and 2. No other row fragments can exist for the row that aren't also participating in the compaction. For LCS, there is no 'rule' that the tombstones can only be evicted at the highest level. They can be evicted on whichever of the level that the row converges on. Depending on your use case this may mean it always happens at level4, it might also mean that it most often happens at L1, or L2. On Fri, Nov 9, 2012 at 7:31 AM, Mina Naguib mina.nag...@adgear.com wrote: On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote: we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into sstables. They enter level0, and get slowly, over time, promoted upwards to levelN. Depending on your *total* mutation volume VS your data set size, this may be quite a slow process. This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone. If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction. This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along. If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night). -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: leveled compaction and tombstoned data
Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster.
Re: leveled compaction and tombstoned data
we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster.
Re: leveled compaction and tombstoned data
kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
LCS works well in specific circumstances, this blog post gives some good considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction On Nov 8, 2012, at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: leveled compaction and tombstoned data
On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction and tombstoned data
http://www.datastax.com/docs/1.1/operations/tuning#testing-compaction-and-compression Write Survey mode. After you have it up and running you can modify the column family mbean to use LeveledCompactionStrategy on that node to see how your hardware/load fares with LCS. On Thu, Nov 8, 2012 at 11:33 AM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x disk IO. Look at iostat, etc and see if you have the headroom. There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. Also, if you're not using compression, check it out. On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote: we are running Datastax enterprise and cannot patch it. how bad is kill performance? if it is so bad, why is it an option? On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote: Dne 8.11.2012 19:12, B. Todd Burruss napsal(a): my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? leveled compaction will kill your performance. get patch from jira for maximum sstable size per CF and force cassandra to make smaller tables, they expire faster. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
Also to answer your question, LCS is well suited to workloads where overwrites and tombstones come into play. The tombstones are _much_ more likely to be merged with LCS than STCS. I would be careful with the patch that was referred to above, it hasn't been reviewed, and from a glance it appears that it will cause an infinite compaction loop if you get more than 4 SSTables at max size. On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: leveled compaction and tombstoned data
thanks for the links! i had forgotten about live sampling On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon
Re: leveled compaction and tombstoned data
@ben, thx, we will be deploying 2.2.1 of DSE soon and will try to setup a traffic sampling node so we can test leveled compaction. we essentially keep a rolling window of data written once. it is written, then after N days it is deleted, so it seems that leveled compaction should help On Thu, Nov 8, 2012 at 11:53 AM, B. Todd Burruss bto...@gmail.com wrote: thanks for the links! i had forgotten about live sampling On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: There are also ways to bring up a test node and just run Level Compaction on that. Wish I had a URL handy, but hopefully someone else can find it. This rather handsome fellow wrote a blog about it: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling -Brandon