[ 
https://issues.apache.org/jira/browse/CASSANDRA-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577055#comment-13577055
 ] 

Srdjan Mitrovic edited comment on CASSANDRA-4775 at 2/12/13 10:39 PM:
----------------------------------------------------------------------

bq. Not sure we'd want to support avg (since it requires extra information to 
be stored, as you point out)
If we record every incr operation we will have extra info (until compaction :( )

I will propose a way you can make idempotent counters work and have all these 
features.
1. Create a CF with columns replayID, counterName, value, cnt and optional 
columns customField1, customField2,.... 
(Random partitioner on replayID or if we want to be sure it is unique we can 
use ComposityType replayID:counterName
2. Create a secondary index on counterName that we use to find sum(value) on 
each node separately because secondary index is distributed. 
3. On compaction we delete old replayID, find total of value*cnt and sum(cnt) 
and store a new row (replayId, counterName, total, new cnt)

We can use increment operation with some count (this will affect avg). For 
example incr(counters, myCounter, replayId, 3, 5) which will increment counter 
by 15 but it will be stored as value 3, cnt 5 so that it affects average in a 
different way than incrementing by value 15, count 1.

We can create custom fields for some reduce(Iterable<Column> so that we can 
support min, max, AND/OR/XOR...For examoke on compaction we would store reduced 
max in that custom field.

It would be ideal if a secondary index could also store values of the columns 
so that we can read counters in one go on each node. There is another jira 
issue for this. After that issue is resolved we can only keep secondary index 
without original CF, we just pretend it exists :)

I guess that this approach could be achieved by clients if we have a pluggable 
compaction strategy but it would still be much easier if secondary indexes 
could also store other column values, not only keys.

                
      was (Author: stecak):
    bq. Not sure we'd want to support avg (since it requires extra information 
to be stored, as you point out)
If we record every incr operation we will have extra info (until compaction :( )

I will propose a way you can make idempotent counters work and have all these 
features.
1. Create a CF with columns replayID, counterName, value, cnt and optional 
columns customField1, customField2,.... 
(Random partitioner on replayID or if we want to be sure it is unique we can 
use ComposityType replayID:counterName
2. Create a secondary index on counterName that we use to find sum(value) on 
each node separately because secondary index is distributed. 
3. on compaction we delete old replayID, find total of value*cnt and sum(cnt) 
and store a new row (replayId, counterName, total, new cnt)

We can use increment operation with some count (this will affect avg). For 
example incr(counters, myCounter, replayId, 3, 5) which will increment counter 
by 15 but it will be stored as value 3, cnt 5 so that it affects average in a 
different way than incrementing by value 15, count 1.

We can create custom fields for some reduce(Iterable<Column> so that we can 
support min, max, AND/OR/XOR...

It would be ideal if a secondary index could also store values of the columns 
so that we can read counters in one go on each node. There is another jira 
issue for this. After that issue is resolved we can only keep secondary index 
without original CF, we just pretend it exists :)

                  
> Counters 2.0
> ------------
>
>                 Key: CASSANDRA-4775
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4775
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Arya Goudarzi
>              Labels: counters
>             Fix For: 2.0
>
>
> The existing partitioned counters remain a source of frustration for most 
> users almost two years after being introduced.  The remaining problems are 
> inherent in the design, not something that can be fixed given enough 
> time/eyeballs.
> Ideally a solution would give us
> - similar performance
> - less special cases in the code
> - potential for a retry mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to