[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554934#comment-15554934 ] ZhaoYang commented on CASSANDRA-12420: -- Hi Tyler, I have updated the dtest according to your new specification. do you think it is needed? https://github.com/riptano/cassandra-dtest/pull/1199 > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: Tyler Hobbs > Labels: doc-impacting > Fix For: 2.1.16 > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15533182#comment-15533182 ] ZhaoYang commented on CASSANDRA-12420: -- Thanks for the fix. > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: Tyler Hobbs > Labels: doc-impacting > Fix For: 2.1.16 > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522751#comment-15522751 ] Benjamin Lerer commented on CASSANDRA-12420: Two minor nits: * The {{HAS_LOGGED_WARNING_FOR_IN_RESTRICTION_WITH_DUPLICATES}} static variable in {{SelectStatement}} is not used anymore and can be removed. * The {{assertRowsIgnoringOrder}}, {{assertRowsIgnoringOrderAndExtra}} and {{assertRowsIgnoringOrderInternal}} methods in {{CQLTester}} are not used and should be removed from the patch. > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: Tyler Hobbs > Labels: doc-impacting > Fix For: 2.1.x > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437534#comment-15437534 ] Tyler Hobbs commented on CASSANDRA-12420: - Patch and pending CI runs: ||branch||testall||dtest|| |[CASSANDRA-12420-2.1|https://github.com/thobbs/cassandra/tree/CASSANDRA-12420-2.1]|[testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-12420-2.1-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-12420-2.1-dtest]| I've tried to match the 2.2+ behavior by returning partitions in the order of the sorted partition keys instead of the order of the IN values. > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: ZhaoYang > Labels: doc-impacting > Fix For: 2.1.x > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430317#comment-15430317 ] Benjamin Lerer commented on CASSANDRA-12420: [~thobbs] I agree with your analysis and I am +1 on changing the behavior. > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 2.1.x > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428621#comment-15428621 ] Tyler Hobbs commented on CASSANDRA-12420: - You are correct that this wouldn't break compatibility with other 2.1.x nodes, but it could cause problems during upgrades to 3.x. In 3.x, another optional field (remainingInPartition) has already been added to the end of the PagingState serialization format. Deserialization of the PagingState is only conditional on the native protocol version, not on the Cassandra version of other nodes in the cluster, so we can't safely introduce this change. In CASSANDRA-6706, the decision was made to continue returning duplicate results in 2.1 when there are duplicate {{IN}} values in order to not make a (potentially) breaking change in a bugfix release. However, this ticket represents a pretty big motivation to change that even in 2.1. So, I'm thinking that we should go ahead and make 2.1 behave like 2.2 and 3.x and not return duplicate results in order to avoid this. [~blerer] do you agree with the above? > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 2.1.x > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427703#comment-15427703 ] ZhaoYang commented on CASSANDRA-12420: -- The patch is to add current pager index into PagingState. I think the fix won't break compatibility in 2.1.x > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 2.1.x > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427108#comment-15427108 ] Tyler Hobbs commented on CASSANDRA-12420: - I've confirmed this is reproduceable in 2.1 with the following: {noformat} cqlsh> create keyspace ks1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1' }; cqlsh> use ks1; cqlsh:ks1> create table foo (a int, b int, c int, d int, PRIMARY KEY ((a, b), c)); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 0, 0); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 1, 1); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 2, 2); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 3, 3); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 4, 4); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 5, 5); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 6, 6); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 7, 7); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 8, 8); cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 9, 9); cqlsh:ks1> PAGING 5; cqlsh:ks1> SELECT * FROM foo WHERE a = 1 AND b IN (1, 1, 1); a | b | c | d ---+---+---+--- 1 | 1 | 0 | 0 1 | 1 | 1 | 1 1 | 1 | 2 | 2 1 | 1 | 3 | 3 1 | 1 | 4 | 4 ---MORE---ยท a | b | c | d ---+---+---+--- 1 | 1 | 5 | 5 1 | 1 | 6 | 6 1 | 1 | 7 | 7 1 | 1 | 8 | 8 1 | 1 | 9 | 9 ---MORE--- a | b | c | d ---+---+---+--- 1 | 1 | 0 | 0 1 | 1 | 1 | 1 1 | 1 | 2 | 2 1 | 1 | 3 | 3 1 | 1 | 4 | 4 ---MORE--- a | b | c | d ---+---+---+--- 1 | 1 | 5 | 5 1 | 1 | 6 | 6 1 | 1 | 7 | 7 1 | 1 | 8 | 8 1 | 1 | 9 | 9 ... (repeats endlessly) {noformat} This does not reproduce in 2.2. This is somewhat different from CASSANDRA-8276, which was just complaining about duplicate result rows when duplicate {{IN}} values are used. The real problem here is that the paged results never end. > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: ZhaoYang > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever
[ https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415273#comment-15415273 ] ZhaoYang commented on CASSANDRA-12420: -- [~thobbs] Hi, I understood that in order to fix this bug, we need to change QueryState and it maybe breaking change. but this bug may cause application server OOM. we would like this to be fixed in 2.1.x.. dtest: https://github.com/riptano/cassandra-dtest/pull/1199 > Duplicated Key in IN clause with a small fetch size will run forever > > > Key: CASSANDRA-12420 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12420 > Project: Cassandra > Issue Type: Bug > Environment: cassandra 2.1.14, driver 2.1.7.1 >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 2.1.16 > > Attachments: CASSANDRA-12420.patch > > > This can be easily reproduced and fetch size is smaller than the correct > number of rows. > A table has 2 partition key, 1 clustering key, 1 column. > >Select select = QueryBuilder.select().from("ks", "cf"); > >select.where().and(QueryBuilder.eq("a", 1)); > >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1))); > >select.setFetchSize(5); > Now we put a distinct method in client side to eliminate the duplicated key, > but it's better to fix inside Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)