[jira] Commented: (CASSANDRA-1421) An eventually consistent approach to counting

Johan Oskarsson (JIRA) Tue, 31 Aug 2010 14:07:36 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904772#action_12904772
 ]


Johan Oskarsson commented on CASSANDRA-1421:
--------------------------------------------

I'd have to agree with Sylvain, read performance and resource usage would 
suffer when many increments are made to the same counter. Looking at how our 
current production load would behave using this approach the amount of data 
that would have to be read back for each counter is substantial.

Estimates show that under current load with default settings a row could grow 
to contain a few megabytes of data. The load is expected to grow with our 
application load in the future. 

Turning down gc grace to a level where the reads are reasonably sized will make 
even a short node downtime an operational issue, due to the need to rebuild 
them, according to http://wiki.apache.org/cassandra/DistributedDeletes

> An eventually consistent approach to counting
> ---------------------------------------------
>
>                 Key: CASSANDRA-1421
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1421
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.7.0
>
>
> Counters may be implemented as multiple rows in a column family; that is, 
> counters will have a configurable shard parameter; a shard factor of 128 
> would have 128 rows.
> An increment will be a (uuid, count) name, value tuple.  The row shard will 
> be uuid % shardfactor.  Timestamp is ignored.  This could be implemented w/ 
> the existing Thrift write api, or we could add a special case method for it.  
> Either is fine; the main advantage of the former is it lets increments be 
> included in batch mutations.
> (Decrements we get for free as simply negative values.)
> Each node will be responsible for aggregating *the rows replicated to it* 
> after GCGraceSeconds have elapsed.  Count aggregation will be a scheduled 
> task on each machine.  This will require a mutex for each shard vs both 
> writes and reads.
> This will not have the conflict resolution problem of CASSANDRA-580, or the 
> write fragility of CASSANDRA-1072.  Normal CL will apply on both read and 
> write.  Write idempotentcy is preserved.  I expect writes will be faster than 
> either, since no reads are required at all on the write path.  Reads will be 
> slower, but the read overhead can be reduced by lowering GCGraceSeconds to 
> below your repair frequency if you are okay with the durability tradeoff 
> there (it will not be worse than CASSANDRA-1072, for instance).  More disk 
> space will be used by this approach, but that is the cheapest resource we 
> have.
> Special case code required will be much less than either the 580 or 1072 
> approach -- primarily some code in StorageProxy to combine the uuid slices 
> with their aggregation columns and sum them for all the shards, the local 
> aggregation code, and minor changes to read/write path to add the mutex vs 
> aggregation.
>  
> We could also get rid of the Clock change and go back to i64 timestamps; if 
> we're not going to use Clocks for increments I don't think they have much 
> raison d'être.  (Those of you just joining us, see 
> http://pl.atyp.us/wordpress/?p=2601 for background.)  The CASSANDRA-1072 
> approach doesn't use Clocks either, or rather, it uses Clocks but not a 
> byte[] value, which really means the Clock is unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1421) An eventually consistent approach to counting

Reply via email to