[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172320#comment-17172320 ] Andres de la Peña commented on CASSANDRA-15459: --- Thanks for the review. Committed to 3.11 as [86a9261fd94707283e4ce149f88756099e22fcb6|https://github.com/apache/cassandra/commit/86a9261fd94707283e4ce149f88756099e22fcb6] and merged up to [trunk|https://github.com/apache/cassandra/commit/ae51f0fd12820707927803fdbe63581f33111d4b]. > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 2h 10m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171769#comment-17171769 ] Caleb Rackliffe commented on CASSANDRA-15459: - [~adelapena] I've created CASSANDRA-16030 to follow what happens to {{test_node_cannot_join_as_hibernating_node_without_replace_address}}. > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171739#comment-17171739 ] Andres de la Peña commented on CASSANDRA-15459: --- It seems that {{test_node_cannot_join_as_hibernating_node_without_replace_address}} has failed in the two last CI rounds for trunk ([259|https://ci-cassandra.apache.org/job/Cassandra-trunk/259/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/] and [258|https://ci-cassandra.apache.org/job/Cassandra-trunk/258/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/]), and doesn't seem related with the changes. > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171734#comment-17171734 ] Caleb Rackliffe commented on CASSANDRA-15459: - Also linked the run with what look like unrelated failures in {{TestRepair}} to CASSANDRA-15986. > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171731#comment-17171731 ] Caleb Rackliffe commented on CASSANDRA-15459: - I haven't seen the trunk failure of [TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/3/label=cassandra,split=16/testReport/junit/dtest.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/] before, but it may just be flaky test that hasn't been reported yet. Otherwise things look clean. > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171713#comment-17171713 ] Andres de la Peña commented on CASSANDRA-15459: --- New CI round: ||PR||utest||dtest||CircleCI j8||CircleCI j11|| |[3.11 |https://github.com/apache/cassandra/pull/635]|[206|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/206/]|[2|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/2/]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/77/workflows/e0030e7d-bdc3-4584-b82d-4eea55bbc599]|-| |[trunk|https://github.com/apache/cassandra/pull/636]|[207|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/207/]|[3|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/3/]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/1fa1cbcf-820d-438f-97bd-91e20a50baba]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/a841da8f-d565-4608-85c6-1432733faeb3]| > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171622#comment-17171622 ] Caleb Rackliffe commented on CASSANDRA-15459: - +1 on both 3.11 and trunk PRs (and the decision to discard the Python dtest PR) Especially w/ the trunk patch, there are some nice simplifications/readability improvements. Might want to run the unit tests one more time to account for CASSANDRA-15872. > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170976#comment-17170976 ] Caleb Rackliffe commented on CASSANDRA-15459: - [~adelapena] What's your opinion on whether the existing Python dtest would be better as an in-JVM dtest? One thing that might be nice in this case is that we could leave the trunk/4.0 version of the test in a simpler state, due to not needing the Byteman injection to avoid blocking read-repair there. Also, I think in-jvm tests would be able to run considerably faster, given they don't need to stop and start nodes. (i.e. They can write only to specific nodes w/ `executeInternal()`.) > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.x, 4.0-beta > > Time Spent: 0.5h > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138356#comment-17138356 ] Andres de la Peña commented on CASSANDRA-15459: --- I think that the way "row counted" is used to calculate limits is not the cause of the problem here. The comments about this usage in [{{ShortReadPartitionsProtection#counted}}|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/service/DataResolver.java#L762-L768] and [{{ShortReadPartitionsProtection.ShortReadRowsProtection#countedInCurrentPartition}}|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/service/DataResolver.java#L907-L913] do not seem to describe what the method is actually doing. These comments where added during CASSANDRA-10707, replacing the original comments added during CASSANDRA-13794 ([here|https://github.com/apache/cassandra/commit/2bae4ca907ac4d2ab53c899e5cf5c9e4de631f52]). I'm restoring the original description of those methods. It seems to me that the cause of the error is [here|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/db/filter/DataLimits.java#L966]. Implementations of {{DataLimit.Counter}} are meant to both count results and also to limit them, being a {{StoppingTransformation}}. The method {{DataLimit.Counter#onlyCount}} allows to disable their result-limiting behaviour, so they only count results without transforming them. The counter [{{singleResultCounter}}|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/service/DataResolver.java#L630-L631] in short read protection uses this read-only behaviour, so it counts the replica results without truncating them, in case more replica results are needed after reconciliation. However, the method {{GroupByAwareCounter#applyToRow}} unconditionally returns a {{null}} row in case the read partition has more rows than the specified by the limit, which can violate the count-only behaviour. Something similar happens in {{GroupByAwareCounter#applyToStatic}}. The proposed PR simply takes into account the {{Counter.enforceLimits}} to prevent this filtering. The dtest PR just adds the excellent test provided by [~jasonstack] in the description, with a minimal change to disable read-repair in 3.11 with Byteman, because we don't have the {{NONE}} repair strategy available in that version. I'm also excluding 3.0 because {{GROUP BY}} was added in 3.10. CI results: ||branch||ci-cassandra utest||ci-cassandra dtest||CircleCI j8||CircleCI j11|| |[3.11|https://github.com/apache/cassandra/pull/635]|[126|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/126/]|[163|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/163/]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/53/workflows/ce4d2cad-a811-43af-a215-b4ea71260d0e]|| |[trunk|https://github.com/apache/cassandra/pull/636]|[127|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/127/]|[164|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/163/]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/55/workflows/9f73dadc-a963-43b6-8792-ea5e0c0e17c8]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/55/workflows/56a025b0-4c18-4eae-9f77-2c261b1d2cc5]| CC [~blerer] > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.7, 4.0-beta > > Time Spent: 0.5h > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) --