[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505783#comment-14505783 ] sankalp kohli commented on CASSANDRA-7168: -- cc [~krummas] I agree with [~slebresne] that we first need to make sure last repair time is consistent across replicas CASSANDRA-9143. There is lot of overlap here between this ticket and CASSANDRA-6434 but I chose this ticket to comment since there is lot of discussions here :). CASSANDRA-6434 will only drop tombstones from the repaired data. The problem with this is that if repair time could not be sent to one replica with CASSANDRA-9143, it will not drop tombstone for the data which other replicas will. Now during a normal read or repair consistency read, this replica which did not get the repair time will include some tombstones which other replicas won't. This is due to different view of what is repaired and what is not. This will cause digest mismatch leading to spike in latency. We also cannot use Benedict approach of finding the last common repair time since replicas which are ahead would have compacted there tombstones leading to the same problem of digest mismatch. I think we need to do CASSANDRA-9143 and also only drop tombstones when we are sure all replicas has that repair time. Also during the time when replicas are getting the message that these set of stables are repaired and don't include the tombstones from them in read and start dropping tombstones if eligible, this is not going to be done at same time across replicas. This will cause digest mismatch during this time which is not ideal. I have not yet thought through how this could be avoided. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505750#comment-14505750 ] Jonathan Ellis commented on CASSANDRA-7168: --- It's been there forever in what I'm convinced was a premature optimization in the first place. So sure, let's doublecheck but I don't think the bar for proof should be unreasonably high. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504983#comment-14504983 ] Aleksey Yeschenko commented on CASSANDRA-7168: -- The 3.0 comment was a bit premature, I take that bit back. Still, should benchmark it once we get some spare time. Think 3.6-3.6 timeframe. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504981#comment-14504981 ] Sylvain Lebresne commented on CASSANDRA-7168: - bq. But in the meantime I do think we should drop digest reads. I'm not necessarily against that, though I agree with Aleksey that it's worth doing serious benchmarking before affirming that it "really doesn't buy us much" since it's been there pretty much forever. bq. Do we have any spare resources to do the testing prior to 3.0 release? I know we're always impatient to remove stuff, but since this particular ticket won't make 3.0, I would suggest leaving digests for now and save whatever benchmarking resources we have for 3.0. Just an opinion though. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504915#comment-14504915 ] Aleksey Yeschenko commented on CASSANDRA-7168: -- There has been quite some momentum behind dropping digests for good, and I want that to happen too. That said, we should probably do some heavy benchmarking first at see getting rid of them won't introduce a regression. Do we have any spare resources to do the testing prior to 3.0 release? > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504910#comment-14504910 ] Jonathan Ellis commented on CASSANDRA-7168: --- I'm fine with adding this as opt-in. But in the meantime I do think we should drop digest reads. It's an optimization that really doesn't buy us much, at the cost of a lot of complexity (that I'm still not entirely sure is bug-free) as well as significantly complicating this new optimization (that has much more potential to actually save work). > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504539#comment-14504539 ] Sylvain Lebresne commented on CASSANDRA-7168: - bq. What makes you uncomfortable about relying on repair time? The problem is, what if we screw up the repair time (even temporarily)? Or what if for some reason a sstable get deleted from one node without the user realizing right away (you could argue that in that case we can already break CL guarantees and that's true, but this would make it a lot worth since in practice reading from all replicas does give us a reasonably good protection against this)? The fact that we'll be reading only one node (for the repaired data at least) makes it a lot easier imo to screw up consistency guarantees that if we actually read the data on every node (even if just to send digests). In a way, data/digest reads is a bit brute-force, but that's what make it pretty reliable a mechanism. Relying too heavily on the repair time feels fragile in comparison, and being fragile when it comes to consistency guarantees makes me uncomfortable. bq. What would make you more comfortable? I'm not sure. I would probably like to see this added as an opt-in feature first (ideally with some granularity, either per-query or per-table) so we can slowly built some confidence that our handling of the repair time is solid and we have fail-safes around that mechanism for when things go badly. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503164#comment-14503164 ] Jonathan Ellis commented on CASSANDRA-7168: --- What makes you uncomfortable about relying on repair time? What would make you more comfortable? > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502981#comment-14502981 ] Sylvain Lebresne commented on CASSANDRA-7168: - bq. I'd rather just apply this as an optimization to all CL > ONE, replacing the data/digest split that is almost certainly less useful. I'll admit that I find this a bit scary. This means relying on the repaired time in a way that I personally don't feel yet very comfortable with. At the very least, I think we should preserve the option to do full data queries. And as much as I understand the willingness to simplify the code, I would personally be a lot more comfortable if this was living aside the existing mechanism at first. I can agree however than having a specific CL is weird since that really apply to pretty much all CL. I'd be fine with adding a flag in the native protocol to allow or disallow that feature for instance. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389957#comment-14389957 ] prmg commented on CASSANDRA-7168: - [~tjake] I'm giving a try on this ticket for learning purposes and so far I calculated maxPartitionRepairTime on the coordinator, sent over via the MessagingService to the replicas on the ReadCommand and skipped sstables with repairedAt <= maxPartitionRepairTime on the CollationController. One part that was not clear to me in your description was: bq. We will also need to include tombstones in the results of the non-repaired column family result since they need to be merged with the repaired result. Is that tombstone inclusion already done by the normal flow of the collation controller or is it necessary to add some post-processing after repaired sstables <= maxPartitionRepairTime are skipped? Would be great if you could clarify that a bit for me. Thanks! > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.1 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360422#comment-14360422 ] T Jake Luciani commented on CASSANDRA-7168: --- bq. Do we actually need to add a special ConsistencyLevel? There is nothing requiring it but seems pragmatic before making it the default (or at least a way to opt-out). This might make a good option once/if we get to CASSANDRA-8119 There might be cases where we want the old behavior that I haven't though of yet... > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359581#comment-14359581 ] Jonathan Ellis commented on CASSANDRA-7168: --- Do we actually need to add a special ConsistencyLevel? I'd rather just apply this as an optimization to all CL > ONE, replacing the data/digest split that is almost certainly less useful. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255743#comment-14255743 ] T Jake Luciani commented on CASSANDRA-7168: --- The CollationController class is what merges the rows from the iterators > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255281#comment-14255281 ] Victor Anjos commented on CASSANDRA-7168: - Got a getMaxRepairedTime function and some functions to find repaired and unrepaired SStables in DataTracker. Next step for me will be to combine those into a row (maybe using RowIteratorFactory?!?!) and check on digest mismatches against what's built out of there. Having some trouble figuring out where/how to build a proper RowIteratorFactory to yield me a row to then check on digest mismatches. a little help would be awesome here so that I can finish the work I've started > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207287#comment-14207287 ] T Jake Luciani commented on CASSANDRA-7168: --- To re-summarize this ticket, the goal is to improve performance of queries that require consistency by using the repaired data cut the amount of remote data to check at quorum. Initially let’s only try to perform this optimization when the coordinator is a partition replica. I think the following would be a good way to start: * Add REPAIRED_QUORUM level * Change StorageProxy.read to allow a special code path for REPAIRED_QUORUM that will ** Identify the max repairedAt time for the SStables that cover the partition ** Pass the max repaired at time to the ReadCommand and MessageService ** Execute the repaired only read locally. ** Merge the results. For the actual reads we will need to change the collation controller to take the max repaired at time and ignore sstables repaired sstables with repairedAt > the passed one. We will also need to include tombstones in the results of the non-repaired column family result since they need to be merged with the repaired result. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991658#comment-13991658 ] Sylvain Lebresne commented on CASSANDRA-7168: - I hope we'll get aggregations for 3.0, and it might well be this will provide a good boost in that case. But I wouldn't mind getting aggregation first, and then try this and see if it actually help second, rather that doing it first on the assumption it might help later. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991180#comment-13991180 ] T Jake Luciani commented on CASSANDRA-7168: --- I think aggregations for 3.0 is reasonable. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991065#comment-13991065 ] Benedict commented on CASSANDRA-7168: - It seems like a pretty reasonable idea - the coordinator (if an owner of the range) could simply issue a special quorum request for all data since the local repairedAt time (which would include all deletedcolumns as is, instead of filtering them), which would rule out any races on repair overlaps However aggregations are most likely some way off, so I'm not sure this buys us much in the near-term? > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990779#comment-13990779 ] T Jake Luciani commented on CASSANDRA-7168: --- bq. Just read at ONE and save yourself the trouble. This ticket is a first step in a larger picture. The larger issue I'm trying to solve is allowing fast and consistent aggregations of data in a partition (CASSANDRA-4914) The win here is for rows with millions of points. If we have this CL logic we can perform the aggregation of repaired data locally on one replica (which would require it to be the coordinator of the request). This node would page in the non-repaired data and apply it to the local aggregation as needed. For large partitions the alternative is to do this at CL.ONE (defeats the purpose of a aggregation) or page across the row using quorum reads (slow). > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra disk I/O when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990237#comment-13990237 ] Jonathan Ellis commented on CASSANDRA-7168: --- Clever idea, but you have correctness issues if you overlap with completing repairs: replica A serves up repaired data that doesn't include the most recent value, replicas B and C finish repair in time and so they don't serve up the most recent value either since they are only asked for un-repaired. Just read at ONE and save yourself the trouble. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra work when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990190#comment-13990190 ] Brandon Williams commented on CASSANDRA-7168: - Sounds like REPAIRED_QUORUM isn't going to be a lot of win over QUORUM then since we still have to check multiple nodes, but I guess we'll see. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra work when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990185#comment-13990185 ] T Jake Luciani commented on CASSANDRA-7168: --- 1) only gives you part of the result. It will only be correct as-of the last repair time. But 2) is all the latest data including what is in the memtable. The idea is the coordinator takes the result of 1) and merges in 2) to get the complete consistent view of the result. So there could be the case where in repaired data a row existed but in a subsequest write in the memtable it was tombstoned. In this case the corrdinator would see the tomstone and the result would not include the cell from the repaired result. The same way we deal with merging multiple sstables together. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra work when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990177#comment-13990177 ] Brandon Williams commented on CASSANDRA-7168: - I guess don't understand how 1) can ever happen, unless the query is exactly when the repair ends. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra work when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990176#comment-13990176 ] T Jake Luciani commented on CASSANDRA-7168: --- Right, this is what 2) is for. The coordinator would collate the result. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra work when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels
[ https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990175#comment-13990175 ] Brandon Williams commented on CASSANDRA-7168: - bq. 1) Read from one replica the result from the repaired sstables. ISTM if you do this you can't guarantee the data hasn't been updated after the repair on some other node, since the repair time will always be in the past. > Add repair aware consistency levels > --- > > Key: CASSANDRA-7168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7168 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: T Jake Luciani > Labels: performance > Fix For: 3.0 > > > With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to > avoid a lot of extra work when running queries with higher consistency > levels. > Since repaired data is by definition consistent and we know which sstables > are repaired, we can optimize the read path by having a REPAIRED_QUORUM which > breaks reads into two phases: > > 1) Read from one replica the result from the repaired sstables. > 2) Read from a quorum only the un-repaired data. > For the node performing 1) we can pipeline the call so it's a single hop. > In the long run (assuming data is repaired regularly) we will end up with > much closer to CL.ONE performance while maintaining consistency. > Some things to figure out: > - If repairs fail on some nodes we can have a situation where we don't have > a consistent repaired state across the replicas. > -- This message was sent by Atlassian JIRA (v6.2#6252)