[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2018-02-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360081#comment-16360081
 ] 

ASF GitHub Bot commented on CASSANDRA-12148:


Github user josh-mckenzie closed the pull request at:

https://github.com/apache/cassandra-dtest/pull/6


> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>Priority: Major
>  Labels: client-impacting, native_protocol
> Fix For: 4.0
>
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140375#comment-16140375
 ] 

ASF GitHub Bot commented on CASSANDRA-12148:


GitHub user josh-mckenzie opened a pull request:

https://github.com/apache/cassandra-dtest/pull/6

12148 style

See: https://issues.apache.org/jira/browse/CASSANDRA-12148

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/josh-mckenzie/apache-cassandra-dtest 
12148_style

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra-dtest/pull/6.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6


commit eac28ac1a7e6e60ec5e4c362cbc496cc726819d6
Author: Philip Thompson 
Date:   2017-07-13T08:46:28Z

Push readme update

commit 316902f825fcb5459c9aea9e280f716048c42dbb
Author: Josh McKenzie 
Date:   2016-08-03T17:30:25Z

CASSANDRA-12148 cdc hard-link support




> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>  Labels: client-impacting, native_protocol
> Fix For: 4.0
>
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2017-08-14 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125701#comment-16125701
 ] 

Branimir Lambov commented on CASSANDRA-12148:
-

LGTM

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
> Fix For: 4.x
>
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complication I can think of with the above design is uncompressed 
> mmapped CommitLogSegments on Windows being undeletable, but it'd be pretty 
> s

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2017-07-20 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094740#comment-16094740
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

Updated with some minor changes:
* Rebased to trunk
* Added replay logic tester
* Tweaked signature in CommitLogReader and added missing handling in one of the 
replay signatures in CommitLogReplayer
* Added CDCWriteException instead of using WriteTimeoutException

I'm ambivalent about that last bullet-point. The coordinator will still roll 
things up in a WriteFailureException so I'm not convinced it's worth the effort 
in the drivers to support a new exception type if we don't expose it to the 
client, however it irritated me so I added that.

[cassandra 
branch|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:12148_rebase_trunk]
[dtest 
branch|https://github.com/datastax/python-driver/compare/master...josh-mckenzie:12148]
[python driver 
branch|https://github.com/riptano/cassandra-dtest/compare/master...josh-mckenzie:12148_style]

testall, cdc-testing, and dtests look good on this branch. Only 1 failure on 
dtest and it's a known issue.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
> Fix For: 4.x
>
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2017-06-12 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047231#comment-16047231
 ] 

Jay Zhuang commented on CASSANDRA-12148:


[~JoshuaMcKenzie] thanks. It would be great to have this feature in 4.0.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
> Fix For: 4.x
>
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complication I can think of with the above design is uncompressed 
> mmapped CommitLogSegm

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2017-06-12 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047120#comment-16047120
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

It's on a back-burner but not forgotten. Still hoping to get it in by 4.0.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
> Fix For: 4.x
>
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complication I can think of with the above design is uncompressed 
> mmapped C

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2017-06-08 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043611#comment-16043611
 ] 

Jay Zhuang commented on CASSANDRA-12148:


Hi [~JoshuaMcKenzie], any plan to commit this change to trunk?

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complication I can think of with the above design is uncompressed 
> mmapped CommitLogSegments on Windows being undeletable, but 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-11-03 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633675#comment-15633675
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

Been pulled into other things for awhile, but updated the dtest PR a minute ago 
w/a new idea for a test plan. 
[Link|https://github.com/riptano/cassandra-dtest/pull/1332]

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complicati

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-09-22 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513657#comment-15513657
 ] 

Jim Witschey commented on CASSANDRA-12148:
--

Seeing a couple failures in the new test. Test report:

http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/326/testReport/

We see [some CDC index files created in the destination 
node|http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/326/testReport/node_0_iter_001.cdc_test/TestCDC/test_cdc_data_available_in_cdc_raw/],
 and [a case where the CDC segments are, unexpectedly, different after writing 
non-CDC 
data|http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/326/testReport/node_0_iter_001.cdc_test/TestCDC/test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space/].
 The second could be a bad expectation now that availability behavior has 
changed, I'm not sure.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> desc

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-09-21 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510917#comment-15510917
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

Had a little rebase-borkery that didn't show up until I realcleaned locally. 
Fixed and pushed, re-running CI now.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complication I can think of with the above design is uncompressed 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-09-20 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507392#comment-15507392
 ] 

Jim Witschey commented on CASSANDRA-12148:
--

After review I've opened a [new 
PR|https://github.com/riptano/cassandra-dtest/pull/1332]. The new CI run 
mentioned above is 
[here|http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/324/].

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cas

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-09-20 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507376#comment-15507376
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

Should have that CONTAINS error ironed out with [this 
commit|https://github.com/apache/cassandra/commit/92b5e18def18fde6a185da468be97ec5c0ef9406].
 Waiting on a multiplexed run of the new dtest and a new test run on CI after 
rebasing before commit.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-08-16 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423065#comment-15423065
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

[dtest PR|https://github.com/riptano/cassandra-dtest/pull/1244] opened.

Testing uncovered an invalid state transition from CONTAINS that I'll need to 
track down before I move to commit this. dtest PR is LooseVersion >= 3.10 so 
it's good to merge if it passes review.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeki

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-07-26 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393786#comment-15393786
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

Updated doc and javadoc. I'll hold off on commit until I have a dtest that 
confirms replay links and creates the index file as we expect.

Thanks for the quick review.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases as mentioned above.
> The only complication 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-07-26 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393728#comment-15393728
 ] 

Branimir Lambov commented on CASSANDRA-12148:
-

bq. parse CDC data that's never actually persisted to disk

Thanks, that's the answer I was looking for. Can you state this explicitly 
somewhere in the documentation and perhaps as {{writeCDCIndexFile}} JavaDoc?

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek and only 
> parse the mutations with CDC enabled. My hunch is that the performance delta 
> from doing so wouldn't justify the complexity given the SyncSegment 
> deserialization and seeking restrictions in the compressed and encrypted 
> cases 

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-07-26 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393714#comment-15393714
 ] 

Joshua McKenzie commented on CASSANDRA-12148:
-

I kept the index file around with the understanding that file contents can 
reflect in kernel buffers before they're fsynced and thus be available to the 
consumer before being durable. If the consumer relies on the reflected contents 
and length of the file with a memory-mapped handle, for instance, they could 
theoretically read ahead (in the case of high data production rates) and parse 
CDC data that's never actually persisted to disk if the node goes down. At 
least that's my understanding - correct me if I'm wrong.

As for continuing with a realtime parse from the last point in a segment: I 
forgot that was explicitly mentioned on this ticket. It's a cleanly modular / 
separate change to this one so I was planning on creating another ticket to 
deal with that.


> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
>

[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-07-26 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393342#comment-15393342
 ] 

Branimir Lambov commented on CASSANDRA-12148:
-

The code looks good.

One question is nagging me though -- does the index file actually add anything? 
Reading up until this position is the same as reading until
- the section ending before the current end of file (compressed log)
- the end-of-file section marker with 0 length (uncompressed log)

It's the continuing from the last point read that is more interesting, and the 
patch doesn't do anything to solve it. Do we have a proof-of-concept of that?

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all completed SyncSegments to get to the root of the desired 
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest 
> seen CDC offset, we could instead store an offset per CDC mutation 
> (potentially delta encoded) in the idx file to allow clients to seek a