[ https://issues.apache.org/jira/browse/CASSANDRA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905673#action_12905673 ]
Chris Goffinet commented on CASSANDRA-1421: ------------------------------------------- I have to agree with Kevin as well on this. Digg is in the exact same position, needing perf/scalability. We can afford to drop some counts in a failure. The compromise by Johan Oskarsson on 1072 seems like a reasonable solution IMHO. > An eventually consistent approach to counting > --------------------------------------------- > > Key: CASSANDRA-1421 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1421 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Fix For: 0.7.0 > > > Counters may be implemented as multiple rows in a column family; that is, > counters will have a configurable shard parameter; a shard factor of 128 > would have 128 rows. > An increment will be a (uuid, count) name, value tuple. The row shard will > be uuid % shardfactor. Timestamp is ignored. This could be implemented w/ > the existing Thrift write api, or we could add a special case method for it. > Either is fine; the main advantage of the former is it lets increments be > included in batch mutations. > (Decrements we get for free as simply negative values.) > Each node will be responsible for aggregating *the rows replicated to it* > after GCGraceSeconds have elapsed. Count aggregation will be a scheduled > task on each machine. This will require a mutex for each shard vs both > writes and reads. > This will not have the conflict resolution problem of CASSANDRA-580, or the > write fragility of CASSANDRA-1072. Normal CL will apply on both read and > write. Write idempotentcy is preserved. I expect writes will be faster than > either, since no reads are required at all on the write path. Reads will be > slower, but the read overhead can be reduced by lowering GCGraceSeconds to > below your repair frequency if you are okay with the durability tradeoff > there (it will not be worse than CASSANDRA-1072, for instance). More disk > space will be used by this approach, but that is the cheapest resource we > have. > Special case code required will be much less than either the 580 or 1072 > approach -- primarily some code in StorageProxy to combine the uuid slices > with their aggregation columns and sum them for all the shards, the local > aggregation code, and minor changes to read/write path to add the mutex vs > aggregation. > > We could also get rid of the Clock change and go back to i64 timestamps; if > we're not going to use Clocks for increments I don't think they have much > raison d'ĂȘtre. (Those of you just joining us, see > http://pl.atyp.us/wordpress/?p=2601 for background.) The CASSANDRA-1072 > approach doesn't use Clocks either, or rather, it uses Clocks but not a > byte[] value, which really means the Clock is unnecessary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.