[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537524#comment-14537524 ] Frens Jan Rumph commented on CASSANDRA-8940: [~blerer], great that you found this out and resolved the matter! > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Fix For: 2.0.15 > > Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 8940.txt, > 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, > setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516949#comment-14516949 ] Frens Jan Rumph commented on CASSANDRA-8940: Thanks for the update. I guess you are on to something. Again, if there's anything I can help with. I'm happy to pitch in. (a bit of topic): I wasn't aware that Cassandra performs the count on the coordinator. I wonder why one couldn't push the count operator to the replicas involved. I see that aggregate functions in Cassandra trunk are implemented in a similar fashion. A pity if you ask me. As I understand it, select count queries operate on top of normal select all queries. Does this mean that this 'skipping' of rows might also be a problem in other cases? Or is it only a problem because the result set is processed/paged on a Cassandra node and not in a driver? > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, > 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, > setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509603#comment-14509603 ] Frens Jan Rumph commented on CASSANDRA-8940: Great. If there's anything I can help with, let me know. > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, > 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, > setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509043#comment-14509043 ] Frens Jan Rumph commented on CASSANDRA-8940: @[~blerer], peculiar to say the least! I haven't got any Cassandra nodes on bare metal, VM / container only. I wasn't able to reproduce using CCM on my laptop. But maybe it's dependent on the number of rows / overhead of a VM / container / network / ...? I you would substantially reduce the row counts in my test script, things would probably be just fine. I haven't had issues with counting say a 1000 rows. > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, > 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, > setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frens Jan Rumph updated CASSANDRA-8940: --- Attachment: 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv 7b74fb00-e935-11e4-b10c-317579db7eb4.csv > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, > 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, > setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507981#comment-14507981 ] Frens Jan Rumph commented on CASSANDRA-8940: Hi [~blerer], Wow, thanks a lot for all the trouble you've been through! Weirdest thing that you aren't able to reproduce the issue. It might give a clue though. I assume you are running the scripts from your host machine? If so it might be more of a client then a server related issue. Could you by any chance run the script from one of the nodes if you haven't done so already? If you place the script in {{test.py}} next to the {{Vagrantfile}} you should be able to do something like (as root / with sudo): {code} curl https://bootstrap.pypa.io/get-pip.py | python pip install cassandra-driver cd /vagrant python test.py cas-1 cas-2 cas-3 {code} I have attached to csv dumps from {{system_traces.events}}: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv which counted to 494453 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv which counted to 494833 I wasn't able to count to the 50 rows which were in the table with tracing enabled ... perhaps looking at differences between the traces reveals something? The traces were generated from the script running from one of the Vagrant nodes by the way. Cheers, Frens Jan > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Attachments: Vagrantfile, install_cassandra.sh, setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492700#comment-14492700 ] Frens Jan Rumph edited comment on CASSANDRA-8940 at 4/15/15 6:42 AM: - [~blerer], sorry for the delay ... been a bit busy past few weeks. I've whipped up a script which should reproduce my problems: {code} import cassandra.cluster import cassandra.concurrent import string import sys def setup_schema(session): print("setting up schema") session.execute("CREATE KEYSPACE IF NOT EXISTS count_test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};") session.set_keyspace("count_test") session.execute(""" CREATE TABLE IF NOT EXISTS tbl ( id text, bucket bigint, offset int, value double, PRIMARY KEY ((id, bucket), offset) ) """) def insert_test_data(session): # setup parameters for the inserts ids = string.lowercase[:5] bucket_count = 10 offset_count = 1 print('inserting data for %s ids, %s buckets and %s offsets' % (len(ids), bucket_count, offset_count)) # clear the table session.execute("TRUNCATE tbl;") # prepare the insert insert = session.prepare("INSERT INTO tbl (id, bucket, offset, value) VALUES (?, ?, ?, ?)") # insert a CQL row for each tag, bucket and offset inserts = [ (insert, (t, b, o, 0)) for t in ids for b in xrange(bucket_count) for o in xrange(offset_count) ] _ = cassandra.concurrent.execute_concurrent(session, inserts) return len(inserts) if __name__ == '__main__': contact_points = sys.argv[1:] print('connecting to %s' % ', '.join(contact_points)) session = cassandra.cluster.Cluster(contact_points).connect() try: setup_schema(session) inserted = insert_test_data(session) print("inserted %s rows" % inserted) for count in (session.execute("SELECT count(*) FROM tbl", timeout=120) for _ in range(10)): print('queried count was %s%s' % (count[0].count, '' if count[0].count == inserted else ' (fail)')) finally: session.shutdown() {code} In my setup this yields (on a particular run): {code} setting up schema inserting data for 5 ids, 10 buckets and 1000 offsets inserted 5 rows queried count was 5 queried count was 49396 (fail) queried count was 49918 (fail) queried count was 5 queried count was 5 queried count was 5 queried count was 49993 (fail) queried count was 48997 (fail) queried count was 49772 (fail) queried count was 49551 (fail) {code} As you can see the counts vary. The number of failures seem to be correlated to the number of rows in the cluster. E.g. with only 1000 rows there are no wrong counts. As for my set-up: I'm using a three node cluster (cas-1, cas-2 and cas-3) which run on Vagrant + LXC. I planned on writing a script using CCM to be portable, but I wasn't able to reproduce the results with CCM! I've tried both Cassandra 2.1.2 and 2.1.4 with CCM. That was rather disappointing. Or looking at it differently ... it might be considered a clue to where things go wrong ... Any of this ring a bell? Do you perhaps have pointers for me to dig deeper? was (Author: frensjan): [~blerer], sorry for the delay ... been a bit busy past few weeks. I've whipped up a script which should reproduce my problems: {code} import cassandra.cluster import cassandra.concurrent import string import sys def setup_schema(session): print("setting up schema") session.execute("CREATE KEYSPACE IF NOT EXISTS count_test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};") session.set_keyspace("count_test") session.execute(""" CREATE TABLE IF NOT EXISTS tbl ( id text, bucket bigint, offset int, value double, PRIMARY KEY ((id, bucket), offset) ) """) def insert_test_data(session): # setup parameters for the inserts ids = string.lowercase[:5] bucket_count = 10 offset_count = 1000 print('inserting data for %s ids, %s buckets and %s offsets' % (len(ids), bucket_count, offset_count)) # clear the table session.execute("TRUNCATE tbl;") # prepare the insert insert = session.prepare("INSERT INTO tbl (id, bucket, offset, value) VALUES (?, ?, ?, ?)") # insert a CQL row for each tag, bucket and offset
[jira] [Updated] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frens Jan Rumph updated CASSANDRA-8940: --- Attachment: Vagrantfile setup_hosts.sh install_cassandra.sh Great [~blerer]! As I said before, I had issues reproducing my issue with CCM. The set-up in which I could reproduce it was built on Vagrant + LXC ... which I didn't want to bother you with ;) So I put some effort in a more general set-up based on Vagrant + Virtualbox, see the attached files. The vagrant file creates a 3 node cluster on CentOS 7 with Cassandra 2.1 (2.1.4 at the time of writing ... depends on the packaging by datastax, so might bump in the future on a new patch version). At first I thought I had the same issue as with trying to use CCM, but apparently I needed to increase the number of rows written from 50k to 500k (with 5 ids, 10 buckets each (so 50 partitions) and 100k rows per partition). Example output from my setup: {code} connecting to 192.168.33.11, 192.168.33.12, 192.168.33.13 setting up schema inserting data for 5 ids, 10 buckets and 1 offsets inserted 50 rows queried count was 494495 (fail) queried count was 493530 (fail) queried count was 494604 (fail) queried count was 49 (fail) queried count was 50 queried count was 494382 (fail) queried count was 494204 (fail) queried count was 494625 (fail) queried count was 50 queried count was 494758 (fail) {code} Note that I have slightly modified the script to accept contact points for {{cassandra.cluster.Cluster(...)}} and also increased the number of rows inserted as mentioned before. So it can be executed with e.g. {{python2 test.py 192.168.33.11 192.168.33.12 192.168.33.13}} I haven't had the time do something like a proper sweep of the variables, but I tried a configuration with 5 ids, 1 bucket per id (so 5 unique partition keys) and 100k rows per partition which also seems to fail, but in a perhaps interesting different way, for example: {code} setting up schema inserting data for 5 ids, 1 buckets and 10 offsets inserted 50 rows queried count was 50 queried count was 50 queried count was 403172 (fail) queried count was 50 queried count was 50 queried count was 302821 (fail) queried count was 50 queried count was 50 queried count was 304049 (fail) queried count was 50 {code} With 5 ids, 100 bucket per id and 1k rows per partition - in my set-up - things do seem to pan out better, only one failure out of ten (in a particular run): {code} connecting to 192.168.33.11, 192.168.33.12, 192.168.33.13 setting up schema inserting data for 5 ids, 100 buckets and 1000 offsets inserted 50 rows queried count was 50 queried count was 50 queried count was 50 queried count was 50 queried count was 50 queried count was 498740 (fail) queried count was 50 queried count was 50 queried count was 50 queried count was 50 {code} > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > Attachments: Vagrantfile, install_cassandra.sh, setup_hosts.sh > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492700#comment-14492700 ] Frens Jan Rumph commented on CASSANDRA-8940: [~blerer], sorry for the delay ... been a bit busy past few weeks. I've whipped up a script which should reproduce my problems: {code} import cassandra.cluster import cassandra.concurrent import string import sys def setup_schema(session): print("setting up schema") session.execute("CREATE KEYSPACE IF NOT EXISTS count_test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};") session.set_keyspace("count_test") session.execute(""" CREATE TABLE IF NOT EXISTS tbl ( id text, bucket bigint, offset int, value double, PRIMARY KEY ((id, bucket), offset) ) """) def insert_test_data(session): # setup parameters for the inserts ids = string.lowercase[:5] bucket_count = 10 offset_count = 1000 print('inserting data for %s ids, %s buckets and %s offsets' % (len(ids), bucket_count, offset_count)) # clear the table session.execute("TRUNCATE tbl;") # prepare the insert insert = session.prepare("INSERT INTO tbl (id, bucket, offset, value) VALUES (?, ?, ?, ?)") # insert a CQL row for each tag, bucket and offset inserts = [ (insert, (t, b, o, 0)) for t in ids for b in xrange(bucket_count) for o in xrange(offset_count) ] _ = cassandra.concurrent.execute_concurrent(session, inserts) return len(inserts) if __name__ == '__main__': contact_points = ['cas-1', 'cas-2', 'cas-3'] session = cassandra.cluster.Cluster(contact_points).connect() try: setup_schema(session) inserted = insert_test_data(session) print("inserted %s rows" % inserted) for count in (session.execute("SELECT count(*) FROM tbl") for _ in range(10)): print('queried count was %s%s' % (count[0].count, '' if count[0].count == inserted else ' (fail)')) finally: session.shutdown() {code} In my setup this yields (on a particular run): {code} setting up schema inserting data for 5 ids, 10 buckets and 1000 offsets inserted 5 rows queried count was 5 queried count was 49396 (fail) queried count was 49918 (fail) queried count was 5 queried count was 5 queried count was 5 queried count was 49993 (fail) queried count was 48997 (fail) queried count was 49772 (fail) queried count was 49551 (fail) {code} As you can see the counts vary. The number of failures seem to be correlated to the number of rows in the cluster. E.g. with only 1000 rows there are no wrong counts. As for my set-up: I'm using a three node cluster (cas-1, cas-2 and cas-3) which run on Vagrant + LXC. I planned on writing a script using CCM to be portable, but I wasn't able to reproduce the results with CCM! I've tried both Cassandra 2.1.2 and 2.1.4 with CCM. That was rather disappointing. Or looking at it differently ... it might be considered a clue to where things go wrong ... Any of this ring a bell? Do you perhaps have pointers for me to dig deeper? > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph >Assignee: Benjamin Lerer > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frens Jan Rumph updated CASSANDRA-8940: --- Environment: 2.1.2 Fix Version/s: (was: 2.1.2) > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.2 >Reporter: Frens Jan Rumph > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
[ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357150#comment-14357150 ] Frens Jan Rumph commented on CASSANDRA-8940: For completeness sake I also tested with CL=QUORUM and ALL which as one would expect with RF=1 yields the same inconsistent results. > Inconsistent select count and select distinct > - > > Key: CASSANDRA-8940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Frens Jan Rumph > Fix For: 2.1.2 > > > When performing {{select count( * ) from ...}} I expect the results to be > consistent over multiple query executions if the table at hand is not written > to / deleted from in the mean time. However, in my set-up it is not. The > counts returned vary considerable (several percent). The same holds for > {{select distinct partition-key-columns from ...}}. > I have a table in a keyspace with replication_factor = 1 which is something > like: > {code} > CREATE TABLE tbl ( > id frozen, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > {code} > The frozen udt is: > {code} > CREATE TYPE id_type ( > tags map > ); > {code} > The table contains around 35k rows (I'm not trying to be funny here ...). The > consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8940) Inconsistent select count and select distinct
Frens Jan Rumph created CASSANDRA-8940: -- Summary: Inconsistent select count and select distinct Key: CASSANDRA-8940 URL: https://issues.apache.org/jira/browse/CASSANDRA-8940 Project: Cassandra Issue Type: Bug Components: Core Reporter: Frens Jan Rumph Fix For: 2.1.2 When performing {{select count( * ) from ...}} I expect the results to be consistent over multiple query executions if the table at hand is not written to / deleted from in the mean time. However, in my set-up it is not. The counts returned vary considerable (several percent). The same holds for {{select distinct partition-key-columns from ...}}. I have a table in a keyspace with replication_factor = 1 which is something like: {code} CREATE TABLE tbl ( id frozen, bucket bigint, offset int, value double, PRIMARY KEY ((id, bucket), offset) ) {code} The frozen udt is: {code} CREATE TYPE id_type ( tags map ); {code} The table contains around 35k rows (I'm not trying to be funny here ...). The consistency level for the queries was ONE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7711) composite column not sliced when using IN clause on (other) composite columns
Frens Jan Rumph created CASSANDRA-7711: -- Summary: composite column not sliced when using IN clause on (other) composite columns Key: CASSANDRA-7711 URL: https://issues.apache.org/jira/browse/CASSANDRA-7711 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.0.9 Reporter: Frens Jan Rumph Hi, I'm storing data points in cassandra keyed by a number of values and a timestamp. I'd want to use IN clauses to select points and sliced by time. The in clauses work, but I can't get it to work in combination with the slicing: all values are returned / the range in the where clause seems to be ignored. A dumbed down abstract version of my layout and some sample data: {code} create table tbl ( a text, b text, c int, d int, primary key ((a), b, c) ); insert into tbl (a,b,c,d) values ('a1', 'b1', 1, 1); insert into tbl (a,b,c,d) values ('a1', 'b1', 2, 2); insert into tbl (a,b,c,d) values ('a1', 'b2', 1, 1); insert into tbl (a,b,c,d) values ('a1', 'b2', 2, 2); insert into tbl (a,b,c,d) values ('a2', 'b1', 1, 1); insert into tbl (a,b,c,d) values ('a2', 'b1', 2, 2); insert into tbl (a,b,c,d) values ('a3', 'b2', 1, 1); insert into tbl (a,b,c,d) values ('a3', 'b2', 2, 2); {code} So the table contains: {code} a | b | c | d ++---+--- a1 | b1 | 1 | 1 a1 | b1 | 2 | 2 a1 | b2 | 1 | 1 a1 | b2 | 2 | 2 a2 | b1 | 1 | 1 a2 | b1 | 2 | 2 a3 | b2 | 1 | 1 a3 | b2 | 2 | 2 {code} When performing {{select * from tbl where a in ('a1', 'a2') and (b) in (('b1'), ('b2')) and c > 1;}} I get: {code} a | b | c | d ++---+--- a1 | b1 | 1 | 1 a1 | b1 | 2 | 2 a1 | b2 | 1 | 1 a1 | b2 | 2 | 2 a2 | b1 | 1 | 1 a2 | b1 | 2 | 2 {code} But I expected: {code} a | b | c | d ++---+--- a1 | b1 | 2 | 2 a1 | b2 | 2 | 2 a2 | b1 | 2 | 2 {code} Am I doing something wrong? Or is {{c > 1}} incorrectly ignored? {{select * from tbl where a in ('a1', 'a2') and b='b1' and c > 1;}} does correctly produce: {code} a | b | c | d ++---+--- a1 | b1 | 2 | 2 a2 | b1 | 2 | 2 {code} So I expect this behaviour to relate to the interworking of the IN clause on the clustering column b and the > predicate on column c. Cheers, Frens Jan -- This message was sent by Atlassian JIRA (v6.2#6252)