[jira] [Comment Edited] (CASSANDRA-6220) Unable to select multiple entries using In clause on clustering part of compound key

2013-10-22 Thread Constance Eustace (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802237#comment-13802237
 ] 

Constance Eustace edited comment on CASSANDRA-6220 at 10/22/13 8:41 PM:


If I do this sequence:

DROP SCHEMA
CREATE SCHEMA
CREATE INITIAL DATA (i.e. no updates to existing data)
NODETOOL COMPACT -- magic sauce
MASSIVE INSERT + SIMULTANEOUS UPDATES to INITIAL DATA

does not reproduce. The nodetool compact after the schema creation seems to 
reset/stabilize the database. I used to replicate very reliably after about 
300,000 inserts / 2000 updates. Now I do 1.75million inserts with 20,000 
updates and no reproduction.

Obviously you could probably run the nodetool compact after the SCHEMA 
creation, and then do initial data creation/update+insert run




was (Author: cowardlydragon):
If I do this sequence:

DROP SCHEMA
CREATE SCHEMA
CREATE INITIAL DATA (i.e. no updates to existing data)
NODETOOL COMPACT -- magic sauce
MASSIVE INSERT + SIMULTANEOUS UPDATES to INITIAL DATA

does not reproduce. The nodetool compact after the schema creation seems to 
reset/stabilize the database. I used to replicate very reliably after about 
300,000 inserts / 2000 updates. Now I do 1.75million inserts with 20,000 
updates and no reproduction.




 Unable to select multiple entries using In clause on clustering part of 
 compound key
 

 Key: CASSANDRA-6220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ashot Golovenko
 Attachments: inserts.zip


 I have the following table:
 CREATE TABLE rating (
 id bigint,
 mid int,
 hid int,
 r double,
 PRIMARY KEY ((id, mid), hid));
 And I get really really strange result sets on the following queries:
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329320;
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329220;
  hid   | r
 ---+---
  201329220 | 53.62
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid in (201329320, 201329220);
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)  -- WRONG - should be two records
 As you can see although both records exist I'm not able the fetch all of them 
 using in clause. By now I have to cycle my requests which are about 30 and I 
 find it highly inefficient given that I query physically the same row. 
 More of that  - it doesn't happen all the time! For different id values 
 sometimes I get the correct dataset.
 Ideally I'd like the following select to work:
 SELECT hid, r FROM rating WHERE id  = 755349113 and mid in ? and hid in ?;
 Which doesn't work either.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6220) Unable to select multiple entries using In clause on clustering part of compound key

2013-10-22 Thread Constance Eustace (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802269#comment-13802269
 ] 

Constance Eustace edited comment on CASSANDRA-6220 at 10/22/13 9:11 PM:



I was able to reproduce the original way of reproduction (drop schema, create 
schema, INSERT / UPDATE with no nodetool compact in there). Post-repair of the 
corruption seemed to require nodetool compact, invalidatekeycache, and/or 
possibly flush.

Now that I've repaired. I'm going to run a 3.5 million insert + simulataneous 
update run to see if the nodetool compact repair makes the data more durable.




was (Author: cowardlydragon):
It may also require invalidatekeycache / caches, possibly with a flush in there 
as well...

 Unable to select multiple entries using In clause on clustering part of 
 compound key
 

 Key: CASSANDRA-6220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ashot Golovenko
 Attachments: inserts.zip


 I have the following table:
 CREATE TABLE rating (
 id bigint,
 mid int,
 hid int,
 r double,
 PRIMARY KEY ((id, mid), hid));
 And I get really really strange result sets on the following queries:
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329320;
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329220;
  hid   | r
 ---+---
  201329220 | 53.62
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid in (201329320, 201329220);
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)  -- WRONG - should be two records
 As you can see although both records exist I'm not able the fetch all of them 
 using in clause. By now I have to cycle my requests which are about 30 and I 
 find it highly inefficient given that I query physically the same row. 
 More of that  - it doesn't happen all the time! For different id values 
 sometimes I get the correct dataset.
 Ideally I'd like the following select to work:
 SELECT hid, r FROM rating WHERE id  = 755349113 and mid in ? and hid in ?;
 Which doesn't work either.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6220) Unable to select multiple entries using In clause on clustering part of compound key

2013-10-22 Thread Constance Eustace (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802269#comment-13802269
 ] 

Constance Eustace edited comment on CASSANDRA-6220 at 10/22/13 9:12 PM:


I was able to reproduce the original way of reproduction (drop schema, create 
schema, INSERT / UPDATE with no nodetool compact in there). Post-repair of the 
corruption seemed to require nodetool compact, invalidatekeycache, and/or 
possibly flush.

Now that I've repaired. I'm going to run a 3.5 million insert + simulataneous 
update run to see if the nodetool compact repair makes the data more durable, 
as has been seen today before.




was (Author: cowardlydragon):

I was able to reproduce the original way of reproduction (drop schema, create 
schema, INSERT / UPDATE with no nodetool compact in there). Post-repair of the 
corruption seemed to require nodetool compact, invalidatekeycache, and/or 
possibly flush.

Now that I've repaired. I'm going to run a 3.5 million insert + simulataneous 
update run to see if the nodetool compact repair makes the data more durable.



 Unable to select multiple entries using In clause on clustering part of 
 compound key
 

 Key: CASSANDRA-6220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ashot Golovenko
 Attachments: inserts.zip


 I have the following table:
 CREATE TABLE rating (
 id bigint,
 mid int,
 hid int,
 r double,
 PRIMARY KEY ((id, mid), hid));
 And I get really really strange result sets on the following queries:
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329320;
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329220;
  hid   | r
 ---+---
  201329220 | 53.62
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid in (201329320, 201329220);
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)  -- WRONG - should be two records
 As you can see although both records exist I'm not able the fetch all of them 
 using in clause. By now I have to cycle my requests which are about 30 and I 
 find it highly inefficient given that I query physically the same row. 
 More of that  - it doesn't happen all the time! For different id values 
 sometimes I get the correct dataset.
 Ideally I'd like the following select to work:
 SELECT hid, r FROM rating WHERE id  = 755349113 and mid in ? and hid in ?;
 Which doesn't work either.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6220) Unable to select multiple entries using In clause on clustering part of compound key

2013-10-21 Thread Constance Eustace (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800709#comment-13800709
 ] 

Constance Eustace edited comment on CASSANDRA-6220 at 10/21/13 2:58 PM:


one of the CASS-6137 comments has a github with a reproduction script if you 
need to reliably reproduce. Takes about 400,000 inserts + 6000 updates for me, 
single node

https://github.com/cowarlydragon/CASS-6137


was (Author: cowardlydragon):
one of the CASS-6137 comments has a github with a reproduction script if you 
need to reliably reproduce. Takes about 400,000 inserts + 6000 updates 

 Unable to select multiple entries using In clause on clustering part of 
 compound key
 

 Key: CASSANDRA-6220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ashot Golovenko
 Attachments: inserts.zip


 I have the following table:
 CREATE TABLE rating (
 id bigint,
 mid int,
 hid int,
 r double,
 PRIMARY KEY ((id, mid), hid));
 And I get really really strange result sets on the following queries:
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329320;
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329220;
  hid   | r
 ---+---
  201329220 | 53.62
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid in (201329320, 201329220);
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)  -- WRONG - should be two records
 As you can see although both records exist I'm not able the fetch all of them 
 using in clause. By now I have to cycle my requests which are about 30 and I 
 find it highly inefficient given that I query physically the same row. 
 More of that  - it doesn't happen all the time! For different id values 
 sometimes I get the correct dataset.
 Ideally I'd like the following select to work:
 SELECT hid, r FROM rating WHERE id  = 755349113 and mid in ? and hid in ?;
 Which doesn't work either.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6220) Unable to select multiple entries using In clause on clustering part of compound key

2013-10-21 Thread Constance Eustace (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800984#comment-13800984
 ] 

Constance Eustace edited comment on CASSANDRA-6220 at 10/21/13 8:15 PM:


Does nodetool compact keyspace tablename fix the corruption? It did for me, 
but I don't think it stops the ongoing corruption...


EDIT: my reproduction seems to indicate nodetool compact MAY fix ongoing 
updates after the nodetool compact was executed... I was unable to generate 
bad queries after another 1.5 million row inserts and 30,000 updates to 
existing data. 


was (Author: cowardlydragon):
Does nodetool compact keyspace tablename fix the corruption? It did for me, 
but I don't think it stops the ongoing corruption...

 Unable to select multiple entries using In clause on clustering part of 
 compound key
 

 Key: CASSANDRA-6220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ashot Golovenko
 Attachments: inserts.zip


 I have the following table:
 CREATE TABLE rating (
 id bigint,
 mid int,
 hid int,
 r double,
 PRIMARY KEY ((id, mid), hid));
 And I get really really strange result sets on the following queries:
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329320;
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid = 201329220;
  hid   | r
 ---+---
  201329220 | 53.62
 (1 rows)
 cqlsh:bm SELECT hid, r FROM rating WHERE id  = 755349113 and mid = 201310 
 and hid in (201329320, 201329220);
  hid   | r
 ---+
  201329320 | 45.476
 (1 rows)  -- WRONG - should be two records
 As you can see although both records exist I'm not able the fetch all of them 
 using in clause. By now I have to cycle my requests which are about 30 and I 
 find it highly inefficient given that I query physically the same row. 
 More of that  - it doesn't happen all the time! For different id values 
 sometimes I get the correct dataset.
 Ideally I'd like the following select to work:
 SELECT hid, r FROM rating WHERE id  = 755349113 and mid in ? and hid in ?;
 Which doesn't work either.



--
This message was sent by Atlassian JIRA
(v6.1#6144)