does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
know that C* is not well suited to this kind of workload, but that's where
we are, and before I go looking for an entirely new data layer I would
rather explore whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always
occur by backend processes to clean up data after it has been processed,
and thus they do not need to be 100% available.  So this made me think,
what if I did the following?

   - gc_grace_seconds = 0, which ensures that tombstones are never created
   - replication factor = 3
   - for writes that are inserts, consistency = QUORUM, which ensures that
   writes can proceed even if 1 replica is slow/down
   - for deletes, consistency = ALL, which ensures that when we delete a
   record it disappears entirely (no need for tombstones)
   - for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't
need to worry about inconsistencies created by partial updates (e.g. value
gets changed on one machine but not another).  Sometimes there will be
duplicate writes, but I think that should be fine since the value is always
identical.

Any red flags with this approach?  Has anyone tried it and have experiences
to share?  Also, I *think* that this means that I don't need to run
repairs, which from an ops perspective is great.

Thanks, as always,
- Ian


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Eric Stevens
No, deletes are always written as a tombstone no matter the consistency.
This is because data at rest is written to sstables which are immutable
once written. The tombstone marks that a record in another sstable is now
deleted, and so a read of that value should be treated as if it doesn't
exist.

When sstables are later compacted, several sstables are merged into one and
any overlapping values between the tables are condensed into one. Values
which have a tombstone can be excluded from the new sstable. GC grace
period indicates how long a tombstone should be kept after all underlying
values have been compacted away so that the deleted value can't be
resurrected if a node rejoins the cluster which knew that value.
On Dec 16, 2014 8:23 AM, Ian Rose ianr...@fullstory.com wrote:

 Howdy all,

 Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
 know that C* is not well suited to this kind of workload, but that's where
 we are, and before I go looking for an entirely new data layer I would
 rather explore whether C* could be tuned to work well for us.

 However, deletions are never driven by users in our app - deletions always
 occur by backend processes to clean up data after it has been processed,
 and thus they do not need to be 100% available.  So this made me think,
 what if I did the following?

- gc_grace_seconds = 0, which ensures that tombstones are never created
- replication factor = 3
- for writes that are inserts, consistency = QUORUM, which ensures
that writes can proceed even if 1 replica is slow/down
- for deletes, consistency = ALL, which ensures that when we delete a
record it disappears entirely (no need for tombstones)
- for reads, consistency = QUORUM

 Also, I should clarify that our data essentially append only, so I don't
 need to worry about inconsistencies created by partial updates (e.g. value
 gets changed on one machine but not another).  Sometimes there will be
 duplicate writes, but I think that should be fine since the value is always
 identical.

 Any red flags with this approach?  Has anyone tried it and have
 experiences to share?  Also, I *think* that this means that I don't need to
 run repairs, which from an ops perspective is great.

 Thanks, as always,
 - Ian




Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Robert Wille
Tombstones have to be created. The SSTables are immutable, so the data cannot 
be deleted. Therefore, a tombstone is required. The value you deleted will be 
physically removed during compaction.

My workload sounds similar to yours in some respects, and I was able to get C* 
working for me. I have large chunks of data which I periodically replace. I 
write the new data, update a reference, and then delete the old data. I 
designed my schema to be tombstone-friendly, and C* works great. For some of my 
tables I am able to delete entire partitions. Because of the reference that I 
updated, I never try to access the old data, and therefore the tombstones for 
these partitions are never read. The old data simply has to wait for 
compaction. Other tables require deleting records within partitions. These 
tombstones do get read, so there are performance implications. I was able to 
design my schema so that no partition ever has more than a few tombstones (one 
for each generation of deleted data, which is usually no more than one).

Hope this helps.

Robert

On Dec 16, 2014, at 8:22 AM, Ian Rose 
ianr...@fullstory.commailto:ianr...@fullstory.com wrote:

Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to clean up data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?

  *   gc_grace_seconds = 0, which ensures that tombstones are never created
  *   replication factor = 3
  *   for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down
  *   for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones)
  *   for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian




Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Ah, makes sense.  Thanks for the explanations!

- Ian


On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille rwi...@fold3.com wrote:

  Tombstones have to be created. The SSTables are immutable, so the data
 cannot be deleted. Therefore, a tombstone is required. The value you
 deleted will be physically removed during compaction.

  My workload sounds similar to yours in some respects, and I was able to
 get C* working for me. I have large chunks of data which I periodically
 replace. I write the new data, update a reference, and then delete the old
 data. I designed my schema to be tombstone-friendly, and C* works great.
 For some of my tables I am able to delete entire partitions. Because of the
 reference that I updated, I never try to access the old data, and therefore
 the tombstones for these partitions are never read. The old data simply has
 to wait for compaction. Other tables require deleting records within
 partitions. These tombstones do get read, so there are performance
 implications. I was able to design my schema so that no partition ever has
 more than a few tombstones (one for each generation of deleted data, which
 is usually no more than one).

  Hope this helps.

  Robert

  On Dec 16, 2014, at 8:22 AM, Ian Rose ianr...@fullstory.com wrote:

  Howdy all,

  Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
 know that C* is not well suited to this kind of workload, but that's where
 we are, and before I go looking for an entirely new data layer I would
 rather explore whether C* could be tuned to work well for us.

  However, deletions are never driven by users in our app - deletions
 always occur by backend processes to clean up data after it has been
 processed, and thus they do not need to be 100% available.  So this made me
 think, what if I did the following?

- gc_grace_seconds = 0, which ensures that tombstones are never created
- replication factor = 3
- for writes that are inserts, consistency = QUORUM, which ensures
that writes can proceed even if 1 replica is slow/down
- for deletes, consistency = ALL, which ensures that when we delete a
record it disappears entirely (no need for tombstones)
- for reads, consistency = QUORUM

 Also, I should clarify that our data essentially append only, so I don't
 need to worry about inconsistencies created by partial updates (e.g. value
 gets changed on one machine but not another).  Sometimes there will be
 duplicate writes, but I think that should be fine since the value is always
 identical.

  Any red flags with this approach?  Has anyone tried it and have
 experiences to share?  Also, I *think* that this means that I don't need to
 run repairs, which from an ops perspective is great.

  Thanks, as always,
 - Ian





Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Jack Krupansky
When you say “no need for tombstones”, did you actually read that somewhere or 
were you just speculating? If the former, where exactly?

-- Jack Krupansky

From: Ian Rose 
Sent: Tuesday, December 16, 2014 10:22 AM
To: user 
Subject: does consistency=ALL for deletes obviate the need for tombstones?

Howdy all, 

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to clean up data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?
  a.. gc_grace_seconds = 0, which ensures that tombstones are never created 
  b.. replication factor = 3 
  c.. for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down 
  d.. for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones) 
  e.. for reads, consistency = QUORUM
Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
I was speculating.  From the responses above, it now appears to me that
tombstones serve (at least) 2 distinct roles:

1. When reading within a single cassandra instance, they mark a new version
of a value (that value being deleted).  Without this, the prior version
would be the most recent and so reads would still return the last value
even after it was deleted.

2. They can resolve discrepancies when a client read receives conflicting
answers from Cassandra nodes (e.g. where one of the nodes is out of date
because it never saw the delete command).

So in the above I was only referring to #2, without realizing the role they
play in #1.

- Ian




On Tue, Dec 16, 2014 at 11:12 AM, Jack Krupansky j...@basetechnology.com
wrote:

   When you say “no need for tombstones”, did you actually read that
 somewhere or were you just speculating? If the former, where exactly?

 -- Jack Krupansky

  *From:* Ian Rose ianr...@fullstory.com
 *Sent:* Tuesday, December 16, 2014 10:22 AM
 *To:* user user@cassandra.apache.org
 *Subject:* does consistency=ALL for deletes obviate the need for
 tombstones?

  Howdy all,

 Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
 know that C* is not well suited to this kind of workload, but that's where
 we are, and before I go looking for an entirely new data layer I would
 rather explore whether C* could be tuned to work well for us.

 However, deletions are never driven by users in our app - deletions always
 occur by backend processes to clean up data after it has been processed,
 and thus they do not need to be 100% available.  So this made me think,
 what if I did the following?

- gc_grace_seconds = 0, which ensures that tombstones are never
created
- replication factor = 3
- for writes that are inserts, consistency = QUORUM, which ensures
that writes can proceed even if 1 replica is slow/down
- for deletes, consistency = ALL, which ensures that when we delete a
record it disappears entirely (no need for tombstones)
- for reads, consistency = QUORUM

 Also, I should clarify that our data essentially append only, so I don't
 need to worry about inconsistencies created by partial updates (e.g. value
 gets changed on one machine but not another).  Sometimes there will be
 duplicate writes, but I think that should be fine since the value is always
 identical.

 Any red flags with this approach?  Has anyone tried it and have
 experiences to share?  Also, I *think* that this means that I don't need to
 run repairs, which from an ops perspective is great.

 Thanks, as always,
 - Ian