[ https://issues.apache.org/jira/browse/CASSANDRA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880280#action_12880280 ]
Kelvin Kakugawa commented on CASSANDRA-1072: -------------------------------------------- {noformat} context-based clocks interface extensions to cassandra.thrift: replace timestamp w/ Clock() Clock: optional long timestamp optional byte[] context data structure code changes: db.ColumnFamilyType + db.ClockType enums db.ColumnFamilyType: Super / Standard db.ClockType: Timestamp / IncrementCounter applied to all IColumnContainer sub-classes (CF / SC) checked to determine switches in code db.context package IContext: context creation + manipulation AbstractReconciler context-based clock reconciliation IncrementCounterContext context structure (current): {timestamp of last update + [(node id, count), ...] compare(): timestamp-based compare (of last update) -- highest diff(): tuple-based comparison greater than: has at least every node and each count is larger (than comparison context) db.IClock concrete *Clock representations encapsulates db.context.IContext functionality current sub-classes: TimestampClock IncrementCounterClock where the ClockType knows which contextManager (db.context.IContext) to use db.IColumn timestamp replaced w/ IClock markedForDeleteAt replaced w/ IClock algorithm code changes: 1) on insert a) thrift.CassandraServer : doInsert(...) thrift.ThriftValidation : validateClock(Clock) takes a thrift Clock and creates the appropriate IClock impl b) service.StorageProxy : mutateBlocking(...) db.RowMutation : updateClocks() iterates through all CFs w/in RM for any context-based CF type creates appropriate context structure i) counter looks at value being inserted, then creates appropriate context e.g. {timestamp + [(replica node id, value as long in bytes)]} c) local / remote insert db.Table : apply() CF.addColumn() inserts into CSLM (ConcurrentSkipListMap) of columns_ if null returned, then success and exit else: save delta (the associated count for the XClock being inserted) pull old Column use Reconciler to collapse saved delta Column w/ old Column counter clocks: e.g. for incremental counters i) aggregate this replica's counts ii) take max of every other replica's counts 2) read CL.ONE read: just pull from the first replica that answers read repair (used by QUORUM and, in the background, ONE): check step: read from each replica blockFor QUORUM # of replicas where one replica is randomly chosen to be non-digest check results in service.ReadResponseResolver : resolve() calculate digest for non-digest CF against all digests received if they don't match: then kick off repair step repair step: read non-digest from every replica blockFor QUORUM # of replicas fix results in service.RRR : resolve() + two other methods i) assemble all versions of the CF from replicas received ii) create a "resolved" CF via CF.resolve() CF.resolve(other CF) CF.addAll(other CF) calls CF.addColumn() for each IColumn in the other CF iii) for each version received, create a repair version to be sent to that replica repairCF = reconciledCF.diff(versionCF) if null, skip call: repairCF.cleanNodeCounts(replica to repair) wipes out all the counts for the given replica in every *CounterClock in the CF otherwise, send RM w/ repairCF under read-repair verb 3) compaction uses same CF.addColumn() code path to aggregate Columns across SSTs nothing special 4) AES uses a modified compaction iterator service.AntiEntropyService : doAESCompaction() that applies the same code path from read-repair: XCounterClock : cleanNodeCounts(InetAddress replica) so, that the IClock contexts being created to repair the remote replicas do not send over the counts for that given replica {noformat} > Increment counters > ------------------ > > Key: CASSANDRA-1072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1072 > Project: Cassandra > Issue Type: Sub-task > Components: Core > Reporter: Johan Oskarsson > Assignee: Kelvin Kakugawa > Attachments: CASSANDRA-1072.patch, CASSANDRA-1072.patch > > > Break out the increment counters out of CASSANDRA-580. Classes are shared > between the two features but without the plain version vector code the > changeset becomes smaller and more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.