[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249212#comment-15249212 ]
Joshua McKenzie commented on CASSANDRA-8844: -------------------------------------------- I am impressed (and bothered) at how much I missed the forest for the trees on that one - I refactored out the {{CommitLogReplayer}} behavior quite awhile before adding the segment/offset skipping logic in the CommitLogReader for CDC and it never clicked that I was just duplicating the existing CommitLogReplayer globalPosition skip. I better understand where the confusion on our discussion (and your reading of the code) stemmed from. Pushed a commit that does the following: * Moved {{CommitLogReplayer}} skip logic into {{CommitLogReader}} * Unified on minPosition in {{CommitLogReader}} rather than old startPosition * Removed superfluous interface methods * Tidied up and commented various read* methods in CommitLogReader * Commented CommitLogSegment.nextId to clarify that we rely on it for correct ordering between multiple CLSM * Revised static initializer in CommitLogSegment to take CDC log location into account on idBase determination * Added comment in CommitLog reinforcing the need for the above The fact that none of us caught the idBase determination in CommitLogSegment's init makes me wary, and I agree with you that this needs further testing. Where are we with that [~mambocab]? Regarding the DirectorySizeCalculator, while I much prefer the elegance of your one-liner # I like to avoid changing code that's battle-tested and working during an unrelated refactor # it's a micro-optimzation in a part of the code that's not critical path and where the delta will be on the order of microseconds for the average case (though a large simplification and reduction in code as well, so I'd do it for that alone), and # the benchmarking results of testing that on both win10 and linux had some surprises in store: {noformat} Windows, skylake, SSD: DirectorySizeCalculator [java] Result: 31.061 ¦(99.9%) 0.287 ms/op [Average] [java] Statistics: (min, avg, max) = (30.861, 31.061, 33.028), stdev = 0.430 [java] Confidence interval (99.9%): [30.774, 31.349] One liner: [java] Result: 116.941 ¦(99.9%) 1.238 ms/op [Average] [java] Statistics: (min, avg, max) = (115.163, 116.941, 124.950), stdev = 1.854 [java] Confidence interval (99.9%): [115.703, 118.179] Linux, haswell, SSD: DirectorySizeCalculator [java] Result: 76.765 ±(99.9%) 0.876 ms/op [Average] [java] Statistics: (min, avg, max) = (75.586, 76.765, 81.744), stdev = 1.311 [java] Confidence interval (99.9%): [75.889, 77.641] One liner: [java] Result: 57.608 ±(99.9%) 0.889 ms/op [Average] [java] Statistics: (min, avg, max) = (56.365, 57.608, 61.697), stdev = 1.330 [java] Confidence interval (99.9%): [56.719, 58.497] {noformat} I think that makes a strong case for us having a platform independent implementation of this and doing this in a follow-up ticket. I also haven't done anything about CommitLogSegmentPosition's name yet. I don't have really strong feelings on it but am leaning towards {{CommitLogPosition}}. Re-ran CI since we've made quite a few minor tweaks/refactors throughout, and there's a small amount (14 failures) of house-cleaning left to do on the tests. I'll start digging into that tomorrow. > Change Data Capture (CDC) > ------------------------- > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths > Reporter: Tupshin Harper > Assignee: Joshua McKenzie > Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > processing > -- Be able to continuously "tail" the most recent logfile and get > low-latency(ms?) access to the data as it is written. > h2. Alternate approach > In order to make consuming a change log easy and efficient to do with low > latency, the following could supplement the approach outlined above > - Instead of writing to a logfile, by default, Cassandra could expose a > socket for a daemon to connect to, and from which it could pull each row. > - Cassandra would have a limited buffer for storing rows, should the listener > become backlogged, but it would immediately spill to disk in that case, never > incurring large in-memory costs. > h2. Additional consumption possibility > With all of the above, still relevant: > - instead (or in addition to) using the other logging mechanisms, use CQL > transport itself as a logger. > - Extend the CQL protoocol slightly so that rows of data can be return to a > listener that didn't explicit make a query, but instead registered itself > with Cassandra as a listener for a particular event type, and in this case, > the event type would be anything that would otherwise go to a CDC log. > - If there is no listener for the event type associated with that log, or if > that listener gets backlogged, the rows will again spill to the persistent > storage. > h2. Possible Syntax > {code:sql} > CREATE TABLE ... WITH CDC LOG > {code} > Pros: No syntax extesions > Cons: doesn't make it easy to capture the various permutations (i'm happy to > be proven wrong) of per-dc logging. also, the hypothetical multiple logs per > table would break this > {code:sql} > CREATE CDC_LOG mylog ON mytable WHERE MyUdf(mycol1, mycol2) = 5 with > DCs={'dc1','dc3'} > {code} > Pros: Expressive and allows for easy DDL management of all aspects of CDC > Cons: Syntax additions. Added complexity, partly for features that might not > be implemented -- This message was sent by Atlassian JIRA (v6.3.4#6332)