[ 
https://issues.apache.org/jira/browse/CASSANDRA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905673#action_12905673
 ] 

Chris Goffinet commented on CASSANDRA-1421:
-------------------------------------------

I have to agree with Kevin as well on this. Digg is in the exact same position, 
needing perf/scalability. We can afford to drop some counts in a failure. The 
compromise by Johan Oskarsson on 1072 seems like a reasonable solution IMHO.

> An eventually consistent approach to counting
> ---------------------------------------------
>
>                 Key: CASSANDRA-1421
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1421
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.7.0
>
>
> Counters may be implemented as multiple rows in a column family; that is, 
> counters will have a configurable shard parameter; a shard factor of 128 
> would have 128 rows.
> An increment will be a (uuid, count) name, value tuple.  The row shard will 
> be uuid % shardfactor.  Timestamp is ignored.  This could be implemented w/ 
> the existing Thrift write api, or we could add a special case method for it.  
> Either is fine; the main advantage of the former is it lets increments be 
> included in batch mutations.
> (Decrements we get for free as simply negative values.)
> Each node will be responsible for aggregating *the rows replicated to it* 
> after GCGraceSeconds have elapsed.  Count aggregation will be a scheduled 
> task on each machine.  This will require a mutex for each shard vs both 
> writes and reads.
> This will not have the conflict resolution problem of CASSANDRA-580, or the 
> write fragility of CASSANDRA-1072.  Normal CL will apply on both read and 
> write.  Write idempotentcy is preserved.  I expect writes will be faster than 
> either, since no reads are required at all on the write path.  Reads will be 
> slower, but the read overhead can be reduced by lowering GCGraceSeconds to 
> below your repair frequency if you are okay with the durability tradeoff 
> there (it will not be worse than CASSANDRA-1072, for instance).  More disk 
> space will be used by this approach, but that is the cheapest resource we 
> have.
> Special case code required will be much less than either the 580 or 1072 
> approach -- primarily some code in StorageProxy to combine the uuid slices 
> with their aggregation columns and sum them for all the shards, the local 
> aggregation code, and minor changes to read/write path to add the mutex vs 
> aggregation.
>  
> We could also get rid of the Clock change and go back to i64 timestamps; if 
> we're not going to use Clocks for increments I don't think they have much 
> raison d'ĂȘtre.  (Those of you just joining us, see 
> http://pl.atyp.us/wordpress/?p=2601 for background.)  The CASSANDRA-1072 
> approach doesn't use Clocks either, or rather, it uses Clocks but not a 
> byte[] value, which really means the Clock is unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to