[ 
https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049834#comment-13049834
 ] 

Sylvain Lebresne commented on CASSANDRA-2774:
---------------------------------------------


bq. I think with quorum delete you will guarantee timing to be consistent eoyh 
client And then achieve client expected result I. Your Case, id like to hear 
your counter example

Consider a cluster with RF=3 and counter c replicated on node A, B and C.  
Consider that all operation are done by the same client connected to some other 
node (doesn't have to be the same each time but can be). All operations are 
performed at QUORUM consistency level.

The client does the following operations:
# increment c by 1
# delete c
# increment c by 1
# reads c

Because QUORUM is 2, depending on internal timings (latency on the wire and 
such), either only 2 or the 3 nodes will have seen each write once it is acked 
to the client. Again, for the same inputs and depending on timing, the client 
could get on the read a variety of results:
* 1 if each node have received each operation in the order issued.
* 0 or 2, if for instance, by the time the read is issued:
** the first increment only reached B and C
** the deletion only reached A and C
** the second increment only reached A and B and it happens that the two first 
node answering the read are B and C. The exact value depends on the exact rules 
for dealing with the epoch number, but in any case, B would only have the two 
increments and C would have the first increment and deletion (issued after the 
increment, so the deletion wins). So B will answer 2 and C will answer a 
tombstone. Whatever resolution the coordinator does, it just cannot return 1 
that time.


> one way to make counter delete work better
> ------------------------------------------
>
>                 Key: CASSANDRA-2774
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2774
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Yang Yang
>         Attachments: counter_delete.diff
>
>
> current Counter does not work with delete, because different merging order of 
> sstables would produces different result, for example:
> add 1
> delete 
> add 2
> if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
> if merging is: 1--3, (1,3)--2, the result will be 3.
> the issue is that delete now can not separate out previous adds and adds 
> later than the delete. supposedly a delete is to create a completely new 
> incarnation of the counter, or a new "lifetime", or "epoch". the new approach 
> utilizes the concept of "epoch number", so that each delete bumps up the 
> epoch number. since each write is replicated (replicate on write is almost 
> always enabled in practice, if this is a concern, we could further force ROW 
> in case of delete ), so the epoch number is global to a replica set
> changes are attached, existing tests pass fine, some tests are modified since 
> the semantic is changed a bit. some cql tests do not pass in the original 
> 0.8.0 source, that's not the fault of this change.
> see details at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktikqcglsnwtt-9hvqpseoo7sf58...@mail.gmail.com%3E
> the goal of this is to make delete work ( at least with consistent behavior, 
> yes in case of long network partition, the behavior is not ideal, but it's 
> consistent with the definition of logical clock), so that we could have 
> expiring Counters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to