Counters
I have a Counter defined as super column family *create column family TestCounter* *with column_type = Super* *and default_validation_class = CounterColumnType;* After i increment/decrement counter columns, cassandra-cli shows super column, column and key name as hex values, how do i get the values as string (please refer the yellow high lighted ones). I believe that some configuration needs to be specified in counter column family definition (create column family TestCounter). I am inserting key, super column name, columns as byte using hector 0.8. [default@Keyspace1] list TestCounter; Using default limit of 100 --- RowKey: 323031312d30352d32382d67657455736572496e666f = (super_column=3232, (counter=3431, value=4) (counter=3432, value=4) (counter=3435, value=4)) = (super_column=546f74616c, (counter=546f74616c, value=12))
Re: expiring + counter column?
sorry to beat on the dead horse. I looked at the link referred from #2103 : https://issues.apache.org/jira/browse/CASSANDRA-2101 I agree with the reasoning in #2101 that the ultimate issue is that delete and counter adds are not commutative. since by definition we can't achieve predictable behavior with deletes + counter, can we redefine the behavior of counter deletes, so that we can always guarantee the declared behavior? --- specifically: *we define that once a counter column is deleted, you can never add to it again.* attempts to add to a dead counter throws an exception all future adds are just ignored. i.e. a counter column has only one life, until all tombstones are purged from system, after which it is possible for the counter to have a new incarnation. basically instead of solving the problem raised in #2103, we declare openly that it's unsolvable (which is true), and make the code reflect this fact. I think this behavior would satisfy most use cases of counters. so instead of relying on the advice to developers: do not do updates for a period after deletes, otherwise it probably wont' work, we enforce this into the code. the same logic can be carried over into expiring column, since they are essentially automatically inserted deletes. that way #2103 could be solved I'm attaching an example below, you can refer to them if needed. Thanks a lot Yang example: for simplicity we assume there is only one column family , one column, so we omit column name and cf name in our notation, assume all counterColumns have a delta value of 1, we only mark their ttl now. so c(123) means a counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. then we can have the following operations operationresult after operation -- c(1)count=1 d(2)count = null ( counter not present ) c(3)count = null ( add on dead counter ignored) --- if the 2 adds arrive out of order , we would still guarantee eventual consistency: operationresult after operation c(1)count=1 c(3)count=2 (we have 2 adds, each with delta=1) d(2)count=null (deleted) -- at the end of both scenarios, the result is guaranteed to be null; note that in the second scenario, line 2 shows a snapshot where we have a state with count=2, which scenario 1 never sees this. this is fine, since even regular columns can have this situation (just consider if the counter columns were inserts/overwrites instead ) On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote: No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote: is this combination feature available , or on track ? thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: expiring + counter column?
errata: so c(123) means a counter column of ttl=1, so c(123) means a counter column of ttl=123, On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote: sorry to beat on the dead horse. I looked at the link referred from #2103 : https://issues.apache.org/jira/browse/CASSANDRA-2101 I agree with the reasoning in #2101 that the ultimate issue is that delete and counter adds are not commutative. since by definition we can't achieve predictable behavior with deletes + counter, can we redefine the behavior of counter deletes, so that we can always guarantee the declared behavior? --- specifically: *we define that once a counter column is deleted, you can never add to it again.* attempts to add to a dead counter throws an exception all future adds are just ignored. i.e. a counter column has only one life, until all tombstones are purged from system, after which it is possible for the counter to have a new incarnation. basically instead of solving the problem raised in #2103, we declare openly that it's unsolvable (which is true), and make the code reflect this fact. I think this behavior would satisfy most use cases of counters. so instead of relying on the advice to developers: do not do updates for a period after deletes, otherwise it probably wont' work, we enforce this into the code. the same logic can be carried over into expiring column, since they are essentially automatically inserted deletes. that way #2103 could be solved I'm attaching an example below, you can refer to them if needed. Thanks a lot Yang example: for simplicity we assume there is only one column family , one column, so we omit column name and cf name in our notation, assume all counterColumns have a delta value of 1, we only mark their ttl now. so c(123) means a counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. then we can have the following operations operationresult after operation -- c(1)count=1 d(2)count = null ( counter not present ) c(3)count = null ( add on dead counter ignored) --- if the 2 adds arrive out of order , we would still guarantee eventual consistency: operationresult after operation c(1)count=1 c(3)count=2 (we have 2 adds, each with delta=1) d(2)count=null (deleted) -- at the end of both scenarios, the result is guaranteed to be null; note that in the second scenario, line 2 shows a snapshot where we have a state with count=2, which scenario 1 never sees this. this is fine, since even regular columns can have this situation (just consider if the counter columns were inserts/overwrites instead ) On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote: No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote: is this combination feature available , or on track ? thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: expiring + counter column?
sorry in the notation, instead of ttl I mean timestamp On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote: sorry to beat on the dead horse. I looked at the link referred from #2103 : https://issues.apache.org/jira/browse/CASSANDRA-2101 I agree with the reasoning in #2101 that the ultimate issue is that delete and counter adds are not commutative. since by definition we can't achieve predictable behavior with deletes + counter, can we redefine the behavior of counter deletes, so that we can always guarantee the declared behavior? --- specifically: *we define that once a counter column is deleted, you can never add to it again.* attempts to add to a dead counter throws an exception all future adds are just ignored. i.e. a counter column has only one life, until all tombstones are purged from system, after which it is possible for the counter to have a new incarnation. basically instead of solving the problem raised in #2103, we declare openly that it's unsolvable (which is true), and make the code reflect this fact. I think this behavior would satisfy most use cases of counters. so instead of relying on the advice to developers: do not do updates for a period after deletes, otherwise it probably wont' work, we enforce this into the code. the same logic can be carried over into expiring column, since they are essentially automatically inserted deletes. that way #2103 could be solved I'm attaching an example below, you can refer to them if needed. Thanks a lot Yang example: for simplicity we assume there is only one column family , one column, so we omit column name and cf name in our notation, assume all counterColumns have a delta value of 1, we only mark their ttl now. so c(123) means a counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. then we can have the following operations operationresult after operation -- c(1)count=1 d(2)count = null ( counter not present ) c(3)count = null ( add on dead counter ignored) --- if the 2 adds arrive out of order , we would still guarantee eventual consistency: operationresult after operation c(1)count=1 c(3)count=2 (we have 2 adds, each with delta=1) d(2)count=null (deleted) -- at the end of both scenarios, the result is guaranteed to be null; note that in the second scenario, line 2 shows a snapshot where we have a state with count=2, which scenario 1 never sees this. this is fine, since even regular columns can have this situation (just consider if the counter columns were inserts/overwrites instead ) On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote: No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote: is this combination feature available , or on track ? thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Consistency Level throughput
You've not talked about how you are doing the tests (or i missed it). With 6 nodes and RF 3 when you read at ONE there is a 50% chance of the data read been served locally. In which case the request will return to the client very quickly. Allowing X number of clients to make Y number of requests. At the higher CL levels the latency of the requests is higher, so X clients cannot make Y number of requests. The throughput of a client may change at different CL levels as it is making a single request at a time. In general there is little impact on the throughput for the cluster as a whole. How were you measuring the throughput? How many clients are you running ? Cheers On 27 May 2011, at 16:35, Ryu Kobayashi wrote: My question is my throughput per case. In general, cluster throughput = single node throughput * number of nodes / replication factor. Yes, I think so too. But I really want to ask is there are no results. Could you look at the chart I made it? http://goo.gl/mACQa 2011/5/27 Maki Watanabe watanabe.m...@gmail.com: I assume your question is on that how CL will affects on the throughput. In theory, I believe CL will not affect on the throughput of the Cassandra system. In any CL, the coordinator node needs to submit write/read requests along the RF specified for the KS. But for the latency, CL will affects on. Stronger CL will cause larger latency. In the real world, it will depends on system configuration, application design, data, and all of the environment. However if you found shorter latency with stronger CL, there must be some reason to explain the behavior. maki 2011/5/27 Ryu Kobayashi beter@gmail.com: Hi, Question of Consistency Level throughput. Environment: 6 nodes. Replication factor is 3. ONE and QUORUM it was not for the throughput difference. ALL just extremely slow. Not ONE had only half the throughput. ONE, TWO and THREE were similar results. Is there any difference between 2 nodes and 3 nodes? -- beter@gmail.com twitter:@ryu_kobayashi -- w3m -- beter@gmail.com twitter:@ryu_kobayashi
Re: Counters
Specify a comparator and sub_comparator for the CF, the cli will then use these to format the values correctly. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 29 May 2011, at 18:02, Mubarak Seyed wrote: I have a Counter defined as super column family create column family TestCounter with column_type = Super and default_validation_class = CounterColumnType; After i increment/decrement counter columns, cassandra-cli shows super column, column and key name as hex values, how do i get the values as string (please refer the yellow high lighted ones). I believe that some configuration needs to be specified in counter column family definition (create column family TestCounter). I am inserting key, super column name, columns as byte using hector 0.8. [default@Keyspace1] list TestCounter; Using default limit of 100 --- RowKey: 323031312d30352d32382d67657455736572496e666f = (super_column=3232, (counter=3431, value=4) (counter=3432, value=4) (counter=3435, value=4)) = (super_column=546f74616c, (counter=546f74616c, value=12))
Re: expiring + counter column?
Without commenting on the other parts of the design, this part is not possible attempts to add to a dead counter throws an exception All write operations are no look operations (write to the log, update memtables) we never look at the SSTables. It goes against the architecture of the write path to require a read from disk. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 29 May 2011, at 20:04, Yang wrote: sorry in the notation, instead of ttl I mean timestamp On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote: sorry to beat on the dead horse. I looked at the link referred from #2103 : https://issues.apache.org/jira/browse/CASSANDRA-2101 I agree with the reasoning in #2101 that the ultimate issue is that delete and counter adds are not commutative. since by definition we can't achieve predictable behavior with deletes + counter, can we redefine the behavior of counter deletes, so that we can always guarantee the declared behavior? --- specifically: we define that once a counter column is deleted, you can never add to it again. attempts to add to a dead counter throws an exception all future adds are just ignored. i.e. a counter column has only one life, until all tombstones are purged from system, after which it is possible for the counter to have a new incarnation. basically instead of solving the problem raised in #2103, we declare openly that it's unsolvable (which is true), and make the code reflect this fact. I think this behavior would satisfy most use cases of counters. so instead of relying on the advice to developers: do not do updates for a period after deletes, otherwise it probably wont' work, we enforce this into the code. the same logic can be carried over into expiring column, since they are essentially automatically inserted deletes. that way #2103 could be solved I'm attaching an example below, you can refer to them if needed. Thanks a lot Yang example: for simplicity we assume there is only one column family , one column, so we omit column name and cf name in our notation, assume all counterColumns have a delta value of 1, we only mark their ttl now. so c(123) means a counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. then we can have the following operations operationresult after operation -- c(1)count=1 d(2)count = null ( counter not present ) c(3)count = null ( add on dead counter ignored) --- if the 2 adds arrive out of order , we would still guarantee eventual consistency: operationresult after operation c(1)count=1 c(3)count=2 (we have 2 adds, each with delta=1) d(2)count=null (deleted) -- at the end of both scenarios, the result is guaranteed to be null; note that in the second scenario, line 2 shows a snapshot where we have a state with count=2, which scenario 1 never sees this. this is fine, since even regular columns can have this situation (just consider if the counter columns were inserts/overwrites instead ) On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote: No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote: is this combination feature available , or on track ? thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Recommandation on how to organize CF
I often suggest people think about using something like JSON for data the looks relatively unchanging, or looks like it is always worked on as a single entity for a couple of reasons. 1. Cassandra does not need to know about every atomic piece of data in your model. Obviously there are some good application reasons to store things in columns, such as TTL, slice ranges, etc etc. Blobing data was generally a bad thing to do in a RDBMS, but IMHO it's a valid option in cassandra. 2. For every column value you store in cassandra you also store the column name, timestamp and some other bytes. This is the price you pay for a schema free DB. So there can be an unexpected storage (and network) bloat if you are storing lots of small values in lots of columns. Whether you consider this expensive has to do with how much you like running ALTER TABLE statements. 3. IMHO there is little difference to code been written to detect if a cassandra row or a JSON dict does not contain a column because it was created before the last code release. Adding attributes to your entity is still a code only change and you only need to update old data if your business problem requires it. There are also a number of reasons not to do it: 1. It does not pass your smell test. 2. You have multiple agents updating the entity with no look writes. 3. You want to pull back parts of the entity, do slices, use TTL, secondary indexes etc etc. 4. You work cross platform, use brisk/hadoop, use hive/pig and it's a pain for everyone. I agree it's not for every situation and it probably makes sense to start coding without it to begin with. But I think it is worth considering in some cases. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 02:57, openvictor Open wrote: Thanks Aaron, Sorry I didn't see your message sooner. So the CF Messages using UTF8Type holds the information such as : who has the right to read/ is it possible to answer to this list etc... There are two kinds of keys. The keys which begin by : message:uuid and the messagelist:uuid. A column of message:uuid is for example sender or rawtext. A column of messagelist:uuid is for example : creator or participants. MessagesTime (message_time) is the sorting mechanism, meaning when I request against message_time I get messages or messagelists in the order it was sent. There are 2 kinds of keys : messagebox:someone : Each Column is for the Value : the uuid of a list inside the messagebox of someone, for the Name : the uuid of the last message in the corresponding messagelist. It gives me a sorting mechanism based on the last message received. messagelist:uuid : Each Column has for its Name : the UUID of a message and for the Value : whatever it doesn't really care. About your suggestion, is a very good solution but there is one thing I don't really like with serialization : it blocks evolution. Let's say I would like to add one field to a message because I want to add a field, I am obliged to make a tool to deserialize, add the information reserialize all the fields and insert. Even if I serialize with JSON it looks like evolution (that is why I chose Cassandra) is a little bit broken.If I am wrong, please tell me so. However I will explore this very interesting possibility for another project with tags, which is not really subject to dramatic evolutions. At the moment I don't really complain about speed and since it is not really time critical (after all who cares if the messagebox loads in 250 ms instead of 200ms). At the moment I get the messages with two batch Cassandra calls so I think this is satisfying. Thanks again, the json serialization looks like a very interesting possibility. Victor 2011/5/19 aaron morton aa...@thelastpickle.com I'm a bit confused by your examples. I think you are saying... - Standard CF called Message using the UTF8Type for column comparisons used to store the individual messages. Row key is the message UUID. Not sure what the columns are. - Standard CF called MessageTime using TimeUUIDType for columns comparison uses to store collections of messages. Row key is messagelist:message_list_uuid for a message list, and messagebox:user_name:mbox_name for message box. Not sure what the columns are. The best model is going to be the one that supports your read requests and the volume of data your are expecting. One way to go is to de normalise to support very fast read paths. You could store the entire message in one column using something like JSON to serialise it. Then - MessageIndexes standard CF to store the full messages in context, there are three different types of rows: * keys with user_name store all messages for a user, column name is the message TimeUUID and value is the message structure * keys with
Re: PHP CQL Driver
Sorry that was a typo the query I had in my test case reads as follows: $query = CREATE COLUMNFAMILY smoke (KEY text PRIMARY KEY, monkey text) WITH comparator = text AND default_validation = text; thanks for your response, still have the same issue. it seems thrift php interface exception aren't always very descriptive. code snippet try{ echo Executing create column query br/; //$query = CREATE KEYSPACE southafrica WITH strategy_options:replication_factor = 1 AND strategy_class = 'SimpleStrategy'; $query = CREATE COLUMNFAMILY smoke (KEY text PRIMARY KEY, monkey text) WITH comparator = text AND default_validation = text; $result = $cassandraClient-execute_cql_query( $query , cassandra_Compression::NONE ); echo |. print_r($result) . | . br; }catch( cassandra_InvalidRequestException $exrs ){ echo COLUMNFAMILY error occuired -- br . $exrs-getTraceAsString() . br; }catch( Exception $ex ){ echo $ex-getTraceAsString(); throw $ex; } CREATE KEYSPACE southafrica works perfetly, USE southafrica works, but create columnfamily :( makes my heart ache... On Sun, May 29, 2011 at 4:27 AM, Eric Evans eev...@rackspace.com wrote: On Thu, 2011-05-26 at 20:51 +0200, Kwasi Gyasi - Agyei wrote: CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH comparator = text AND default_validation = text That's not a valid query. If monkey is a column definition, then it needs a type. If you're trying to name the key, don't do that (at least not yet). Try instead: CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey text) WITH comparator = text AND default_validation = text -- Eric Evans eev...@rackspace.com -- *4Things* Multimedia and Communication | Property | Entertainment Kwasi Owusu Gyasi - Agyei *cell*(+27) (0) 76 466 4488 *website *www.4things.co.za *email *kwasi.gyasiag...@4things.co.za *skype*kwasi.gyasiagyei *role*Developer.Designer.Software Architect
Re: expiring + counter column?
yeah, then maybe we can make that a silent omission. less desirable, but still better than unpredicted behavior. (this is not that bad: currently you can't know whether a write result really reached a quorum, i.e. become effective, anyway) regarding we never look at SStables, I think right now counter adds do require a read on SStables, although asynchronously: StorageProxy: private static void applyCounterMutation(final IMutation mutation, final MultimapInetAddress, InetAddress hintedEndpoints, final IWriteResponseHandler responseHandler, final String localDataCenter, final ConsistencyLevel consistency_level, boolean executeOnMutationStage) { .. sendToHintedEndpoints(cm.makeReplicationMutation(), hintedEndpoints, responseHandler, localDataCenter, false, consistency_level); } CounterMutation.java: public RowMutation makeReplicationMutation() throws IOException { Table table = Table.open(readCommand.table); Row row = readCommand.getRow(table); } I think the getRow() line above does what the .pdf design doc in the JIRA described: replication to other replicas (non-leaders) replicate only the **sum** that I own, not individual delta that I just received. actually I'm not quite understanding why this approach was chosen, since it makes each write into read---write (when getReplicateOnWrite() ) , which can be slow. I'm still trying to understand that Thanks Yang On Sun, May 29, 2011 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote: Without commenting on the other parts of the design, this part is not possible attempts to add to a dead counter throws an exception All write operations are no look operations (write to the log, update memtables) we never look at the SSTables. It goes against the architecture of the write path to require a read from disk. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 29 May 2011, at 20:04, Yang wrote: sorry in the notation, instead of ttl I mean timestamp On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote: sorry to beat on the dead horse. I looked at the link referred from #2103 : https://issues.apache.org/jira/browse/CASSANDRA-2101 I agree with the reasoning in #2101 that the ultimate issue is that delete and counter adds are not commutative. since by definition we can't achieve predictable behavior with deletes + counter, can we redefine the behavior of counter deletes, so that we can always guarantee the declared behavior? --- specifically: *we define that once a counter column is deleted, you can never add to it again.* attempts to add to a dead counter throws an exception all future adds are just ignored. i.e. a counter column has only one life, until all tombstones are purged from system, after which it is possible for the counter to have a new incarnation. basically instead of solving the problem raised in #2103, we declare openly that it's unsolvable (which is true), and make the code reflect this fact. I think this behavior would satisfy most use cases of counters. so instead of relying on the advice to developers: do not do updates for a period after deletes, otherwise it probably wont' work, we enforce this into the code. the same logic can be carried over into expiring column, since they are essentially automatically inserted deletes. that way #2103 could be solved I'm attaching an example below, you can refer to them if needed. Thanks a lot Yang example: for simplicity we assume there is only one column family , one column, so we omit column name and cf name in our notation, assume all counterColumns have a delta value of 1, we only mark their ttl now. so c(123) means a counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. then we can have the following operations operationresult after operation -- c(1)count=1 d(2)count = null ( counter not present ) c(3)count = null ( add on dead counter ignored) --- if the 2 adds arrive out of order , we would still guarantee eventual consistency: operationresult after operation c(1)count=1 c(3)count=2 (we have 2 adds, each with delta=1) d(2)count=null (deleted) -- at the end of both scenarios, the result is guaranteed to be null; note that in the second scenario, line 2 shows a snapshot where we have a state with count=2, which scenario 1 never sees this. this is
Re: expiring + counter column?
The comment around line 448 in StorageProxy // We do the replication on another stage because it involves a read (see CM.makeReplicationMutation) // and we want to avoid blocking too much the MUTATION stage The read happens on another stage, it is not part of the mutation. And the test before that checks shouldReplicateOnWrite for the CF's involved in the mutation, which defaults to false. See also the comments for StorageProxy.mutateCounter() and this comment which I *think* is still valid https://issues.apache.org/jira/browse/CASSANDRA-1909?focusedCommentId=12976727page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12976727 Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 30 May 2011, at 06:41, Yang wrote: yeah, then maybe we can make that a silent omission. less desirable, but still better than unpredicted behavior. (this is not that bad: currently you can't know whether a write result really reached a quorum, i.e. become effective, anyway) regarding we never look at SStables, I think right now counter adds do require a read on SStables, although asynchronously: StorageProxy: private static void applyCounterMutation(final IMutation mutation, final MultimapInetAddress, InetAddress hintedEndpoints, final IWriteResponseHandler responseHandler, final String localDataCenter, final ConsistencyLevel consistency_level, boolean executeOnMutationStage) { .. sendToHintedEndpoints(cm.makeReplicationMutation(), hintedEndpoints, responseHandler, localDataCenter, false, consistency_level); } CounterMutation.java: public RowMutation makeReplicationMutation() throws IOException { Table table = Table.open(readCommand.table); Row row = readCommand.getRow(table); } I think the getRow() line above does what the .pdf design doc in the JIRA described: replication to other replicas (non-leaders) replicate only the **sum** that I own, not individual delta that I just received. actually I'm not quite understanding why this approach was chosen, since it makes each write into read---write (when getReplicateOnWrite() ) , which can be slow. I'm still trying to understand that Thanks Yang On Sun, May 29, 2011 at 3:45 AM, aaron morton aa...@thelastpickle.com wrote: Without commenting on the other parts of the design, this part is not possible attempts to add to a dead counter throws an exception All write operations are no look operations (write to the log, update memtables) we never look at the SSTables. It goes against the architecture of the write path to require a read from disk. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 29 May 2011, at 20:04, Yang wrote: sorry in the notation, instead of ttl I mean timestamp On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote: sorry to beat on the dead horse. I looked at the link referred from #2103 : https://issues.apache.org/jira/browse/CASSANDRA-2101 I agree with the reasoning in #2101 that the ultimate issue is that delete and counter adds are not commutative. since by definition we can't achieve predictable behavior with deletes + counter, can we redefine the behavior of counter deletes, so that we can always guarantee the declared behavior? --- specifically: we define that once a counter column is deleted, you can never add to it again. attempts to add to a dead counter throws an exception all future adds are just ignored. i.e. a counter column has only one life, until all tombstones are purged from system, after which it is possible for the counter to have a new incarnation. basically instead of solving the problem raised in #2103, we declare openly that it's unsolvable (which is true), and make the code reflect this fact. I think this behavior would satisfy most use cases of counters. so instead of relying on the advice to developers: do not do updates for a period after deletes, otherwise it probably wont' work, we enforce this into the code. the same logic can be carried over into expiring column, since they are essentially automatically inserted deletes. that way #2103 could be solved I'm attaching an example below, you can refer to them if needed. Thanks a lot Yang example: for simplicity we assume there is only one column family , one column, so we omit column name and cf name in our notation, assume all counterColumns have a delta value of 1, we only mark their ttl now. so c(123) means a counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. then we can have the following operations operationresult after