[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-10-07 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554934#comment-15554934
 ] 

ZhaoYang commented on CASSANDRA-12420:
--

Hi Tyler, I have updated the dtest according to your new specification. do you 
think it is needed? https://github.com/riptano/cassandra-dtest/pull/1199

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: Tyler Hobbs
>  Labels: doc-impacting
> Fix For: 2.1.16
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-09-29 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15533182#comment-15533182
 ] 

ZhaoYang commented on CASSANDRA-12420:
--

Thanks for the fix.

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: Tyler Hobbs
>  Labels: doc-impacting
> Fix For: 2.1.16
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-09-26 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522751#comment-15522751
 ] 

Benjamin Lerer commented on CASSANDRA-12420:


Two minor nits:
* The {{HAS_LOGGED_WARNING_FOR_IN_RESTRICTION_WITH_DUPLICATES}} static variable 
in {{SelectStatement}} is not used anymore and can be removed.
* The {{assertRowsIgnoringOrder}}, {{assertRowsIgnoringOrderAndExtra}} and 
{{assertRowsIgnoringOrderInternal}} methods in {{CQLTester}} are not used and 
should be removed from the patch.

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: Tyler Hobbs
>  Labels: doc-impacting
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-08-25 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437534#comment-15437534
 ] 

Tyler Hobbs commented on CASSANDRA-12420:
-

Patch and pending CI runs:

||branch||testall||dtest||
|[CASSANDRA-12420-2.1|https://github.com/thobbs/cassandra/tree/CASSANDRA-12420-2.1]|[testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-12420-2.1-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-12420-2.1-dtest]|

I've tried to match the 2.2+ behavior by returning partitions in the order of 
the sorted partition keys instead of the order of the IN values.

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>  Labels: doc-impacting
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-08-22 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430317#comment-15430317
 ] 

Benjamin Lerer commented on CASSANDRA-12420:


[~thobbs] I agree with your analysis and I am +1 on changing the behavior.

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-08-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428621#comment-15428621
 ] 

Tyler Hobbs commented on CASSANDRA-12420:
-

You are correct that this wouldn't break compatibility with other 2.1.x nodes, 
but it could cause problems during upgrades to 3.x.  In 3.x, another optional 
field (remainingInPartition) has already been added to the end of the 
PagingState serialization format.  Deserialization of the PagingState is only 
conditional on the native protocol version, not on the Cassandra version of 
other nodes in the cluster, so we can't safely introduce this change.

In CASSANDRA-6706, the decision was made to continue returning duplicate 
results in 2.1 when there are duplicate {{IN}} values in order to not make a 
(potentially) breaking change in a bugfix release.  However, this ticket 
represents a pretty big motivation to change that even in 2.1.  So, I'm 
thinking that we should go ahead and make 2.1 behave like 2.2 and 3.x and not 
return duplicate results in order to avoid this.

[~blerer] do you agree with the above?

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-08-18 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427703#comment-15427703
 ] 

ZhaoYang commented on CASSANDRA-12420:
--

The patch is to add current pager index into PagingState. I think the fix won't 
break compatibility in 2.1.x

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-08-18 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427108#comment-15427108
 ] 

Tyler Hobbs commented on CASSANDRA-12420:
-

I've confirmed this is reproduceable in 2.1 with the following:

{noformat}
cqlsh> create keyspace ks1 WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1' };
cqlsh> use ks1;
cqlsh:ks1> create table foo (a int, b int, c int, d int, PRIMARY KEY ((a, b), 
c));
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 0, 0);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 1, 1);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 2, 2);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 3, 3);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 4, 4);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 5, 5);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 6, 6);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 7, 7);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 8, 8);
cqlsh:ks1> insert into foo (a, b, c, d) VALUES (1, 1, 9, 9);
cqlsh:ks1> PAGING 5;
cqlsh:ks1> SELECT * FROM foo WHERE a = 1 AND b IN (1, 1, 1);

 a | b | c | d
---+---+---+---
 1 | 1 | 0 | 0
 1 | 1 | 1 | 1
 1 | 1 | 2 | 2
 1 | 1 | 3 | 3
 1 | 1 | 4 | 4
 
---MORE---ยท
 a | b | c | d
---+---+---+---
 1 | 1 | 5 | 5
 1 | 1 | 6 | 6
 1 | 1 | 7 | 7
 1 | 1 | 8 | 8
 1 | 1 | 9 | 9
 
---MORE---
 a | b | c | d
---+---+---+---
 1 | 1 | 0 | 0
 1 | 1 | 1 | 1
 1 | 1 | 2 | 2
 1 | 1 | 3 | 3
 1 | 1 | 4 | 4
 
---MORE---
 a | b | c | d
---+---+---+---
 1 | 1 | 5 | 5
 1 | 1 | 6 | 6
 1 | 1 | 7 | 7
 1 | 1 | 8 | 8
 1 | 1 | 9 | 9

... (repeats endlessly)
{noformat}

This does not reproduce in 2.2.

This is somewhat different from CASSANDRA-8276, which was just complaining 
about duplicate result rows when duplicate {{IN}} values are used.  The real 
problem here is that the paged results never end.

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12420) Duplicated Key in IN clause with a small fetch size will run forever

2016-08-10 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415273#comment-15415273
 ] 

ZhaoYang commented on CASSANDRA-12420:
--

[~thobbs] Hi, I understood that in order to fix this bug, we need to change 
QueryState and it maybe breaking change. but this bug may cause application 
server OOM. we would like this to be fixed in 2.1.x..

dtest: https://github.com/riptano/cassandra-dtest/pull/1199 

> Duplicated Key in IN clause with a small fetch size will run forever
> 
>
> Key: CASSANDRA-12420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12420
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.14, driver 2.1.7.1
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 2.1.16
>
> Attachments: CASSANDRA-12420.patch
>
>
> This can be easily reproduced and fetch size is smaller than the correct 
> number of rows.
> A table has 2 partition key, 1 clustering key, 1 column.
> >Select select = QueryBuilder.select().from("ks", "cf");
> >select.where().and(QueryBuilder.eq("a", 1));
> >select.where().and(QueryBuilder.in("b", Arrays.asList(1, 1, 1)));
> >select.setFetchSize(5);
> Now we put a distinct method in client side to eliminate the duplicated key, 
> but it's better to fix inside Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)