[ https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125701#comment-16125701 ]
Branimir Lambov commented on CASSANDRA-12148: --------------------------------------------- LGTM > Improve determinism of CDC data availability > -------------------------------------------- > > Key: CASSANDRA-12148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12148 > Project: Cassandra > Issue Type: Improvement > Reporter: Joshua McKenzie > Assignee: Joshua McKenzie > Fix For: 4.x > > > The latency with which CDC data becomes available has a known limitation due > to our reliance on CommitLogSegments being discarded to have the data > available in cdc_raw: if a slowly written table co-habitates a > CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until > we hit either memory pressure on memtables or CommitLog limit pressure. > Ultimately, this leaves a non-deterministic element to when data becomes > available for CDC consumption unless a consumer parses live CommitLogSegments. > To work around this limitation and make semi-realtime CDC consumption more > friendly to end-users, I propose we extend CDC as follows: > h6. High level: > * Consumers parse hard links of active CommitLogSegments in cdc_raw instead > of waiting for flush/discard and file move > * C* stores an offset of the highest seen CDC mutation in a separate idx file > per commit log segment in cdc_raw. Clients tail this index file, delta their > local last parsed offset on change, and parse the corresponding commit log > segment using their last parsed offset as min > * C* flags that index file with an offset and DONE when the file is flushed > so clients know when they can clean up > h6. Details: > * On creation of a CommitLogSegment, also hard-link the file in cdc_raw > * On first write of a CDC-enabled mutation to a segment, we: > ** Flag it as {{CDCState.CONTAINS}} > ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled > mutation in the log > ** Set a long in the CommitLogSegment tracking the offset of the end of the > last written CDC mutation in the segment if higher than the previously known > highest CDC offset > * On subsequent writes to the segment, we update the offset of the highest > known CDC data > * On CommitLogSegment fsync, we write a file in cdc_raw as > <segment_name>_cdc.idx containing the min offset and end offset fsynced to > disk per file > * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the > segment in commitlog and in cdc_raw > * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the > segment in commitlog and update the <segment_name>_cdc.idx file w/end offset > and a DONE marker > * On segment replay, store the highest end offset of seen CDC-enabled > mutations from a segment and write that to <segment_name>_cdc.idx on > completion of segment replay. This should bridge the potential correctness > gap of a node writing to a segment and then dying before it can write the > <segment_name>_cdc.idx file. > This should allow clients to skip the beginning of a file to the 1st CDC > mutation, track an offset of how far they've parsed, delta against the > _cdc.idx file end offset, and use that as a determinant on when to parse new > CDC data. Any existing clients written to the initial implementation of CDC > need only add the <segment_name>_cdc.idx logic and checking for DONE marker > to their code, so the burden on users to update to support this should be > quite small for the benefit of having data available as soon as it's fsynced > instead of at a non-deterministic time when potentially unrelated tables are > flushed. > Finally, we should look into extending the interface on CommitLogReader to be > more friendly for realtime parsing, perhaps supporting taking a > CommitLogDescriptor and RandomAccessReader and resuming readSection calls, > assuming the reader is at the start of a SyncSegment. Would probably also > need to rewind to the start of the segment before returning so subsequent > calls would respect this contract. This would skip needing to deserialize the > descriptor and all completed SyncSegments to get to the root of the desired > segment for parsing. > One alternative we discussed offline - instead of just storing the highest > seen CDC offset, we could instead store an offset per CDC mutation > (potentially delta encoded) in the idx file to allow clients to seek and only > parse the mutations with CDC enabled. My hunch is that the performance delta > from doing so wouldn't justify the complexity given the SyncSegment > deserialization and seeking restrictions in the compressed and encrypted > cases as mentioned above. > The only complication I can think of with the above design is uncompressed > mmapped CommitLogSegments on Windows being undeletable, but it'd be pretty > simple to disallow configuration of CDC w/uncompressed CommitLog on that > environment. > And as a final note: while the above might sound involved, it really > shouldn't be a big change from where we are with v1 of CDC from a C* > complexity nor code perspective, or from a client implementation perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org