[ https://issues.apache.org/jira/browse/CASSANDRA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915309#action_12915309 ]
Sylvain Lebresne commented on CASSANDRA-1546: --------------------------------------------- Realized while answering you comment that I had forgot something, so I updated the patch. {quote} Does the lead replica has to iterate all SSTables, and get the latest value of th counter before applying the decr/incr mutation? If so, the read path can be a performance bottleneck. But we can leverage some tricks: only the counter columns in the latest SSTable are valid and others in the old SSTable can be ignored safely. So, the frequently updated counter column can resides in memtable, and local read-modify-write operation only brings negligible performance lost. The counter update path is almost as fast as normal column update path. {quote} During a write, after having apply the increment locally, there is a read a one column (the one corresponding to the local count). This is this value that is sent for replication (this thus integrate the fleshly written update). This read is a normal read, so it hits as many sstables as need be, if that's what you mean. But only one column is read. One way to make this read fast is to use row cache on the counter CF. It is true however that because of the marker columns, the row may become fairly large with high volume counters (even though the row is never read entirely). You can play on the ttl of the marker column however to keep that manageable (the ttl on the marker can be pretty small, in the order a minute or so). As said, you can also not use marker column if you're ready to accept the potential drawbacks, in which case the counter row will be really small and a very good candidate for row cache. I don't know if that is what you were proposing ? Lastly, note that at CL.ONE and without marker column, the counter update path will be as fast as normal column, as far as client are concerned at least. Because on the leader replica we do write then read and replicate. {quote} I have no idea about the detail of the removal before incr/decr problem. But a quick solution could be let the deletion operation snapshots the current value of counter column, write it in another column. Just let the read path to merge these columns, including different counter columns, and the deletion snapshot column. {quote} Ok, the problem is the following: suppose you issue one increment (+1), then you remove the counter, then you increment again (+1). Say the leader replicate is always the same one, but he receives the two increments first. It will 'merge' those two increment, and we'll end up with one column, whose count is 2 and whose timestamp is the one of the last increment. Then it receives the delete. But as far as he's concerned, this delete is obsolete and will be discarded. Even if we were somehow able to detect that the delete should have delete something, how can we know which parts of the now merged count should be kept or not. So basically, remove works if you don't reuse the counter afterwards :) Or after a sufficient time has elapsed. Otherwise, it may work or it may not :( Even though this is really unfortunate, I don't see that as a blocker, since people can always reset the counter by reading the value v of the counter and then insert -v. Then I'm sure we can come up with something smarter. > (Yet another) approach to counting > ---------------------------------- > > Key: CASSANDRA-1546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1546 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 0.7.0 > > Attachments: 0001-Remove-IClock-from-internals.patch, > 0002-Counters.patch, 0003-Generated-thrift-files-changes.patch > > > This could be described as a mix between CASSANDRA-1072 without clocks and > CASSANDRA-1421. > More details in the comment below. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.