[ https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049651#comment-13049651 ]
Yang Yang commented on CASSANDRA-2774: -------------------------------------- note that the original idea is to use the "epoch" number, which is a pure traditional logical clock, but I found that we already have the var of timestampOfLastDelete(), this has exactly the same effect, so I just re-used the timestampOfLastDelete() for the purpose of epoch number. > one way to make counter delete work better > ------------------------------------------ > > Key: CASSANDRA-2774 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2774 > Project: Cassandra > Issue Type: New Feature > Affects Versions: 0.8.0 > Reporter: Yang Yang > Attachments: counter_delete.diff > > > current Counter does not work with delete, because different merging order of > sstables would produces different result, for example: > add 1 > delete > add 2 > if the merging happens by 1-2, (1,2)--3 order, the result we see will be 2 > if merging is: 1--3, (1,3)--2, the result will be 3. > the issue is that delete now can not separate out previous adds and adds > later than the delete. supposedly a delete is to create a completely new > incarnation of the counter, or a new "lifetime", or "epoch". the new approach > utilizes the concept of "epoch number", so that each delete bumps up the > epoch number. since each write is replicated (replicate on write is almost > always enabled in practice, if this is a concern, we could further force ROW > in case of delete ), so the epoch number is global to a replica set > changes are attached, existing tests pass fine, some tests are modified since > the semantic is changed a bit. some cql tests do not pass in the original > 0.8.0 source, that's not the fault of this change. > see details at > http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktikqcglsnwtt-9hvqpseoo7sf58...@mail.gmail.com%3E > the goal of this is to make delete work ( at least with consistent behavior, > yes in case of long network partition, the behavior is not ideal, but it's > consistent with the definition of logical clock), so that we could have > expiring Counters -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira