[ 
https://issues.apache.org/jira/browse/CASSANDRA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915309#action_12915309
 ] 

Sylvain Lebresne commented on CASSANDRA-1546:
---------------------------------------------

Realized while answering you comment that I had forgot something, so I updated 
the patch.

{quote}
Does the lead replica has to iterate all SSTables, and get the latest value of 
th counter before applying the decr/incr mutation? If so, the read path can be 
a performance bottleneck. But we can leverage some tricks: only the counter 
columns in the latest SSTable are valid and others in the old SSTable can be 
ignored safely.

So, the frequently updated counter column can resides in memtable, and local 
read-modify-write operation only brings negligible performance lost. The 
counter update path is almost as fast as normal column update path.
{quote}

During a write, after having apply the increment locally, there is a read a one 
column (the one corresponding to the local count). 
This is this value that is sent for replication (this thus integrate the 
fleshly written update). This read is a normal read, so it hits as
many sstables as need be, if that's what you mean. But only one column is read.
One way to make this read fast is to use row cache on the counter CF. It is 
true however that because of the marker columns, the
row may become fairly large with high volume counters (even though the row is 
never read entirely). You can play on the ttl of 
the marker column however to keep that manageable (the ttl on the marker can be 
pretty small, in the order a minute or so). As said, 
you can also not use marker column if you're ready to accept the potential 
drawbacks, in which case the counter row will be really 
small and a very good candidate for row cache. I don't know if that is what you 
were proposing ?
Lastly, note that at CL.ONE and without marker column, the counter update path 
will be as fast as normal column, as far as client are 
concerned at least. Because on the leader replica we do write then read and 
replicate.
 
{quote}
I have no idea about the detail of the removal before incr/decr problem. But a 
quick solution could be let the deletion operation snapshots the current value 
of counter column, write it in another column. Just let the read path to merge 
these columns, including different counter columns, and the deletion snapshot 
column.
{quote}

Ok, the problem is the following: suppose you issue one increment (+1), then 
you remove the counter, then you increment again (+1). 
Say the leader replicate is always the same one, but he receives the two 
increments first. It will 'merge' those two increment, and
we'll end up with one column, whose count is 2 and whose timestamp is the one 
of the last increment. Then it receives the delete. 
But as far as he's concerned, this delete is obsolete and will be discarded. 
Even if we were somehow able to detect that the delete
should have delete something, how can we know which parts of the now merged 
count should be kept or not.

So basically, remove works if you don't reuse the counter afterwards :) Or 
after a sufficient time has elapsed. Otherwise, it may
work or it may not :(

Even though this is really unfortunate, I don't see that as a blocker, since 
people can always reset the counter by reading the value v
of the counter and then insert -v. Then I'm sure we can come up with something 
smarter.

> (Yet another) approach to counting
> ----------------------------------
>
>                 Key: CASSANDRA-1546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1546
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 0.7.0
>
>         Attachments: 0001-Remove-IClock-from-internals.patch, 
> 0002-Counters.patch, 0003-Generated-thrift-files-changes.patch
>
>
> This could be described as a mix between CASSANDRA-1072 without clocks and 
> CASSANDRA-1421.
> More details in the comment below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to