does consistency=ALL for deletes obviate the need for tombstones?
Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
No, deletes are always written as a tombstone no matter the consistency. This is because data at rest is written to sstables which are immutable once written. The tombstone marks that a record in another sstable is now deleted, and so a read of that value should be treated as if it doesn't exist. When sstables are later compacted, several sstables are merged into one and any overlapping values between the tables are condensed into one. Values which have a tombstone can be excluded from the new sstable. GC grace period indicates how long a tombstone should be kept after all underlying values have been compacted away so that the deleted value can't be resurrected if a node rejoins the cluster which knew that value. On Dec 16, 2014 8:23 AM, Ian Rose ianr...@fullstory.com wrote: Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
Tombstones have to be created. The SSTables are immutable, so the data cannot be deleted. Therefore, a tombstone is required. The value you deleted will be physically removed during compaction. My workload sounds similar to yours in some respects, and I was able to get C* working for me. I have large chunks of data which I periodically replace. I write the new data, update a reference, and then delete the old data. I designed my schema to be tombstone-friendly, and C* works great. For some of my tables I am able to delete entire partitions. Because of the reference that I updated, I never try to access the old data, and therefore the tombstones for these partitions are never read. The old data simply has to wait for compaction. Other tables require deleting records within partitions. These tombstones do get read, so there are performance implications. I was able to design my schema so that no partition ever has more than a few tombstones (one for each generation of deleted data, which is usually no more than one). Hope this helps. Robert On Dec 16, 2014, at 8:22 AM, Ian Rose ianr...@fullstory.commailto:ianr...@fullstory.com wrote: Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? * gc_grace_seconds = 0, which ensures that tombstones are never created * replication factor = 3 * for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down * for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) * for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
Ah, makes sense. Thanks for the explanations! - Ian On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille rwi...@fold3.com wrote: Tombstones have to be created. The SSTables are immutable, so the data cannot be deleted. Therefore, a tombstone is required. The value you deleted will be physically removed during compaction. My workload sounds similar to yours in some respects, and I was able to get C* working for me. I have large chunks of data which I periodically replace. I write the new data, update a reference, and then delete the old data. I designed my schema to be tombstone-friendly, and C* works great. For some of my tables I am able to delete entire partitions. Because of the reference that I updated, I never try to access the old data, and therefore the tombstones for these partitions are never read. The old data simply has to wait for compaction. Other tables require deleting records within partitions. These tombstones do get read, so there are performance implications. I was able to design my schema so that no partition ever has more than a few tombstones (one for each generation of deleted data, which is usually no more than one). Hope this helps. Robert On Dec 16, 2014, at 8:22 AM, Ian Rose ianr...@fullstory.com wrote: Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
When you say “no need for tombstones”, did you actually read that somewhere or were you just speculating? If the former, where exactly? -- Jack Krupansky From: Ian Rose Sent: Tuesday, December 16, 2014 10:22 AM To: user Subject: does consistency=ALL for deletes obviate the need for tombstones? Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? a.. gc_grace_seconds = 0, which ensures that tombstones are never created b.. replication factor = 3 c.. for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down d.. for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) e.. for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian
Re: does consistency=ALL for deletes obviate the need for tombstones?
I was speculating. From the responses above, it now appears to me that tombstones serve (at least) 2 distinct roles: 1. When reading within a single cassandra instance, they mark a new version of a value (that value being deleted). Without this, the prior version would be the most recent and so reads would still return the last value even after it was deleted. 2. They can resolve discrepancies when a client read receives conflicting answers from Cassandra nodes (e.g. where one of the nodes is out of date because it never saw the delete command). So in the above I was only referring to #2, without realizing the role they play in #1. - Ian On Tue, Dec 16, 2014 at 11:12 AM, Jack Krupansky j...@basetechnology.com wrote: When you say “no need for tombstones”, did you actually read that somewhere or were you just speculating? If the former, where exactly? -- Jack Krupansky *From:* Ian Rose ianr...@fullstory.com *Sent:* Tuesday, December 16, 2014 10:22 AM *To:* user user@cassandra.apache.org *Subject:* does consistency=ALL for deletes obviate the need for tombstones? Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. However, deletions are never driven by users in our app - deletions always occur by backend processes to clean up data after it has been processed, and thus they do not need to be 100% available. So this made me think, what if I did the following? - gc_grace_seconds = 0, which ensures that tombstones are never created - replication factor = 3 - for writes that are inserts, consistency = QUORUM, which ensures that writes can proceed even if 1 replica is slow/down - for deletes, consistency = ALL, which ensures that when we delete a record it disappears entirely (no need for tombstones) - for reads, consistency = QUORUM Also, I should clarify that our data essentially append only, so I don't need to worry about inconsistencies created by partial updates (e.g. value gets changed on one machine but not another). Sometimes there will be duplicate writes, but I think that should be fine since the value is always identical. Any red flags with this approach? Has anyone tried it and have experiences to share? Also, I *think* that this means that I don't need to run repairs, which from an ops perspective is great. Thanks, as always, - Ian