[ https://issues.apache.org/jira/browse/CASSANDRA-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871690#comment-13871690 ]
Aleksey Yeschenko commented on CASSANDRA-6504: ---------------------------------------------- Pushed v2 to https://github.com/iamaleksey/cassandra/commits/6504-v2 Just the difference between (trunk with merge in 6505-v2) and 6504-v2: https://github.com/iamaleksey/cassandra/compare/6505-trunk...6504-v2 I am writing more unit tests now (for counter cache and counter mutation), but that shouldn't be blocking the review (the tests will be there before 6504 gets properly committed). bq. I don't think we need the CounterId renewal stuff on cleanup anymore. So 1) we can remove it from Cleanup and remove a bunch of stuff from CounterId Done. We *could* also move the counter id to system.local, since we don't care about history anymore, but I'm keeping it as is for now. bq. 2) this means a node shouldn't change it's counterId at all anymore. So, since the counter cache stores only old local shards, we can skip storing the counterId in each cache key (we'd want to save the counterId at the start of the cache file or something just to assert it hasn't changed at reload time "just in case" but that should be enough). True. Done. Also moved counter cache loading to SS.initServer(), after the potential counter id renewal (the only place where it can happen) and commit log replay. Minor changes not originating from the review: - CFPropDefs (CQL2 and CQL3) to inline KW_REPLICATEONWRITE for obsoleteKeywords - CassandraServer.doInsert() to use MIN(mutations' timeouts) instead of MAX - addition of the missing counter cache nodetool stuff CounterContext stuff from v1 of 6505 that didn't get into 6505-v2, but did get into 6404-v2 branch: - reuse of writeElement() for copyTo() - will go away with 6506 - changes to diff() - will go away with 6506 - inlining of IContext Not changed, or done differently: bq. Nit: in the yaml, it'd be more consistent to keep counter_cache_size_in_mb commented out to mean "default" (rather than adding it but with an empty value). We could then simplify the comment to something like "Default to min(2.5% of the heap, 50MB). Set to 0 to disable." It's blank, but not commented out, for consistency with key_cache_size_in_mb and row_cache_size_in_mb. Should comment out all of them or leave as is. bq. Given that it's a rather hot path, might be worth using a 2 values long[] instead of a Pair<Long, Long>? Created a named class instead - ClockAndCount, that's also reused by the counter cache (used in place of the old CounterCacheEntry) instead. And switched currentValues to an array from an ArrayList, since generics are no longer involved. All the other issues and nits have been fixed. Sorry for doing it in a rebase. If v3 ever happens, it's going to be a separate commit on top of the v2 branch. > counters++ > ---------- > > Key: CASSANDRA-6504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6504 > Project: Cassandra > Issue Type: Improvement > Reporter: Aleksey Yeschenko > Assignee: Aleksey Yeschenko > Fix For: 2.1 > > > Continuing CASSANDRA-4775 here. > We are changing counter write path to explicitly > lock-read-modify-unlock-replicate, thus getting rid of the previously used > 'local' (deltas) and 'remote' shards distinction. Unfortunately, we can't > simply start using 'remote' shards exclusively, since shard merge rules > prioritize the 'local' shards. Which is why we are introducing the third > shard type - 'global', the only shard type to be used in 2.1+. > The updated merge rules are going to look like this: > global + global = keep the shard with the highest logical clock > global + local or remote = keep the global one > local + local = sum counts (and logical clock) > local + remote = keep the local one > remote + remote = keep the shard with highest logical clock > This is required for backward compatibility with pre-2.1 counters. To make > 2.0-2.1 live upgrade possible, 'global' shard merge logic will have to be > back ported to 2.0. 2.0 will not produce them, but will be able to understand > the global shards coming from the 2.1 nodes during the live upgrade. See > CASSANDRA-6505. > Other changes introduced in this issue: > 1. replicate_on_write is gone. From now on we only avoid replication at RF 1. > 2. REPLICATE_ON_WRITE stage is gone > 3. counter mutations are running in their own COUNTER_MUTATION stage now > 4. counter mutations have a separate counter_write_request_timeout setting > 5. mergeAndRemoveOldShards() code is gone, for now, until/unless a better > solution is found > 6. we only replicate the fresh global shard now, not the complete > (potentially quite large) counter context > 7. to help with concurrency and reduce lock contention, we cache node's > global shards in a new counter cache ({cf id, partition key, cell name} -> > {count, clock}). The cache is only used by counter writes, to help with 'hot' > counters being simultaneously updated. > Improvements to be handled by separate JIRA issues: > 1. Split counter context into separate cells - one shard per cell. See > CASSANDRA-6506. This goes into either 2.1 or 3.0. > Potential improvements still being debated: > 1. Coalesce the mutations in COUNTER_MUTATION stage if they share the same > partition key, and apply them together, to improve the locking situation when > updating different counter cells in one partition. See CASSANDRA-6508. Will > to into 2.1 or 3.0, if deemed beneficial. -- This message was sent by Atlassian JIRA (v6.1.5#6160)