[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-05-10 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537524#comment-14537524
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


[~blerer], great that you found this out and resolved the matter!

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Fix For: 2.0.15
>
> Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 8940.txt, 
> 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, 
> setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-28 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516949#comment-14516949
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


Thanks for the update. I guess you are on to something. Again, if there's 
anything I can help with. I'm happy to pitch in.

(a bit of topic): I wasn't aware that Cassandra performs the count on the 
coordinator. I wonder why one couldn't push the count operator to the replicas 
involved. I see that aggregate functions in Cassandra trunk are implemented in 
a similar fashion. A pity if you ask me.

As I understand it, select count queries operate on top of normal select all 
queries. Does this mean that this 'skipping' of rows might also be a problem in 
other cases? Or is it only a problem because the result set is processed/paged 
on a Cassandra node and not in a driver?

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 
> 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, 
> setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-23 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509603#comment-14509603
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


Great. If there's anything I can help with, let me know.

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 
> 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, 
> setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-23 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509043#comment-14509043
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


@[~blerer], peculiar to say the least! I haven't got any Cassandra nodes on 
bare metal, VM / container only. I wasn't able to reproduce using CCM on my 
laptop. But maybe it's dependent on the number of rows / overhead of a VM / 
container / network / ...? I you would substantially reduce the row counts in 
my test script, things would probably be just fine. I haven't had issues with 
counting say a 1000 rows.

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 
> 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, 
> setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-22 Thread Frens Jan Rumph (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frens Jan Rumph updated CASSANDRA-8940:
---
Attachment: 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv
7b74fb00-e935-11e4-b10c-317579db7eb4.csv

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 
> 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, 
> setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-22 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507981#comment-14507981
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


Hi [~blerer],

Wow, thanks a lot for all the trouble you've been through!

Weirdest thing that you aren't able to reproduce the issue. It might give a 
clue though. I assume you are running the scripts from your host machine? If so 
it might be more of a client then a server related issue. Could you by any 
chance run the script from one of the nodes if you haven't done so already?

If you place the script in {{test.py}} next to the {{Vagrantfile}} you should 
be able to do something like (as root / with sudo):
{code}
curl https://bootstrap.pypa.io/get-pip.py | python
pip install cassandra-driver
cd /vagrant
python test.py cas-1 cas-2 cas-3
{code}

I have attached to csv dumps from {{system_traces.events}}:
7b74fb00-e935-11e4-b10c-317579db7eb4.csv which counted to 494453
8d5899d0-e935-11e4-847b-2d06da75a6cd.csv which counted to 494833

I wasn't able to count to the 50 rows which were in the table with tracing 
enabled ... perhaps looking at differences between the traces reveals something?

The traces were generated from the script running from one of the Vagrant nodes 
by the way.

Cheers,
Frens Jan

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Attachments: Vagrantfile, install_cassandra.sh, setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-14 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492700#comment-14492700
 ] 

Frens Jan Rumph edited comment on CASSANDRA-8940 at 4/15/15 6:42 AM:
-

[~blerer], sorry for the delay ... been a bit busy past few weeks.

I've whipped up a script which should reproduce my problems: 

{code}
import cassandra.cluster
import cassandra.concurrent

import string
import sys


def setup_schema(session):
print("setting up schema")

session.execute("CREATE KEYSPACE IF NOT EXISTS count_test WITH 
replication = {'class': 'SimpleStrategy', 'replication_factor': 1};")
session.set_keyspace("count_test")

session.execute("""
CREATE TABLE IF NOT EXISTS tbl (
id text,
bucket bigint,
offset int,
value double,
PRIMARY KEY ((id, bucket), offset)
)
""")


def insert_test_data(session):
# setup parameters for the inserts
ids = string.lowercase[:5]
bucket_count = 10
offset_count = 1

print('inserting data for %s ids, %s buckets and %s offsets' % 
(len(ids), bucket_count, offset_count))

# clear the table
session.execute("TRUNCATE tbl;")

# prepare the insert
insert = session.prepare("INSERT INTO tbl (id, bucket, offset, value) 
VALUES (?, ?, ?, ?)")

# insert a CQL row for each tag, bucket and offset
inserts = [
(insert, (t, b, o, 0))
for t in ids
for b in xrange(bucket_count)
for o in xrange(offset_count)
]
_ = cassandra.concurrent.execute_concurrent(session, inserts)

return len(inserts)


if __name__ == '__main__':
contact_points = sys.argv[1:]
print('connecting to %s' % ', '.join(contact_points))
session = cassandra.cluster.Cluster(contact_points).connect()

try:
setup_schema(session)
inserted = insert_test_data(session)
print("inserted %s rows" % inserted)

for count in (session.execute("SELECT count(*) FROM tbl", 
timeout=120) for _ in range(10)):
print('queried count was %s%s' % (count[0].count, '' if 
count[0].count == inserted else ' (fail)'))
finally:
session.shutdown()
{code}

In my setup this yields (on a particular run):
{code}
setting up schema
inserting data for 5 ids, 10 buckets and 1000 offsets
inserted 5 rows
queried count was 5
queried count was 49396 (fail)
queried count was 49918 (fail)
queried count was 5
queried count was 5
queried count was 5
queried count was 49993 (fail)
queried count was 48997 (fail)
queried count was 49772 (fail)
queried count was 49551 (fail)
{code}

As you can see the counts vary. The number of failures seem to be correlated to 
the number of rows in the cluster. E.g. with only 1000 rows there are no wrong 
counts.

As for my set-up: I'm using a three node cluster (cas-1, cas-2 and cas-3) which 
run on Vagrant + LXC. I planned on writing a script using CCM to be portable, 
but I wasn't able to reproduce the results with CCM! I've tried both Cassandra 
2.1.2 and 2.1.4 with CCM. That was rather disappointing. Or looking at it 
differently ... it might be considered a clue to where things go wrong ...

Any of this ring a bell? Do you perhaps have pointers for me to dig deeper?


was (Author: frensjan):
[~blerer], sorry for the delay ... been a bit busy past few weeks.

I've whipped up a script which should reproduce my problems: 

{code}
import cassandra.cluster
import cassandra.concurrent

import string
import sys


def setup_schema(session):
print("setting up schema")

session.execute("CREATE KEYSPACE IF NOT EXISTS count_test WITH 
replication = {'class': 'SimpleStrategy', 'replication_factor': 1};")
session.set_keyspace("count_test")

session.execute("""
CREATE TABLE IF NOT EXISTS tbl (
id text,
bucket bigint,
offset int,
value double,
PRIMARY KEY ((id, bucket), offset)
)
""")


def insert_test_data(session):
# setup parameters for the inserts
ids = string.lowercase[:5]
bucket_count = 10
offset_count = 1000

print('inserting data for %s ids, %s buckets and %s offsets' % 
(len(ids), bucket_count, offset_count))

# clear the table
session.execute("TRUNCATE tbl;")

# prepare the insert
insert = session.prepare("INSERT INTO tbl (id, bucket, offset, value) 
VALUES (?, ?, ?, ?)")

# insert a CQL row for each tag, bucket and offset
   

[jira] [Updated] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-14 Thread Frens Jan Rumph (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frens Jan Rumph updated CASSANDRA-8940:
---
Attachment: Vagrantfile
setup_hosts.sh
install_cassandra.sh

Great [~blerer]!

As I said before, I had issues reproducing my issue with CCM. The set-up in 
which I could reproduce it was built on Vagrant + LXC ... which I didn't want 
to bother you with ;) So I put some effort in a more general set-up based on 
Vagrant + Virtualbox, see the attached files.

The vagrant file creates a 3 node cluster on CentOS 7 with Cassandra 2.1 (2.1.4 
at the time of writing ... depends on the packaging by datastax, so might bump 
in the future on a new patch version).

At first I thought I had the same issue as with trying to use CCM, but 
apparently I needed to increase the number of rows written from 50k to 500k 
(with 5 ids, 10 buckets each (so 50 partitions) and 100k rows per partition).

Example output from my setup:
{code}
connecting to 192.168.33.11, 192.168.33.12, 192.168.33.13
setting up schema
inserting data for 5 ids, 10 buckets and 1 offsets
inserted 50 rows
queried count was 494495 (fail)
queried count was 493530 (fail)
queried count was 494604 (fail)
queried count was 49 (fail)
queried count was 50
queried count was 494382 (fail)
queried count was 494204 (fail)
queried count was 494625 (fail)
queried count was 50
queried count was 494758 (fail)
{code}

Note that I have slightly modified the script to accept contact points for 
{{cassandra.cluster.Cluster(...)}} and also increased the number of rows 
inserted as mentioned before. So it can be executed with e.g. {{python2 test.py 
192.168.33.11 192.168.33.12 192.168.33.13}}

I haven't had the time do something like a proper sweep of the variables, but I 
tried a configuration with 5 ids, 1 bucket per id (so 5 unique partition keys) 
and 100k rows per partition which also seems to fail, but in a perhaps 
interesting different way, for example:

{code}
setting up schema
inserting data for 5 ids, 1 buckets and 10 offsets
inserted 50 rows
queried count was 50
queried count was 50
queried count was 403172 (fail)
queried count was 50
queried count was 50
queried count was 302821 (fail)
queried count was 50
queried count was 50
queried count was 304049 (fail)
queried count was 50
{code}

With 5 ids, 100 bucket per id and 1k rows per partition - in my set-up - things 
do seem to pan out better, only one failure out of ten (in a particular run):
{code}
connecting to 192.168.33.11, 192.168.33.12, 192.168.33.13
setting up schema
inserting data for 5 ids, 100 buckets and 1000 offsets
inserted 50 rows
queried count was 50
queried count was 50
queried count was 50
queried count was 50
queried count was 50
queried count was 498740 (fail)
queried count was 50
queried count was 50
queried count was 50
queried count was 50
{code}

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
> Attachments: Vagrantfile, install_cassandra.sh, setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-04-13 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492700#comment-14492700
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


[~blerer], sorry for the delay ... been a bit busy past few weeks.

I've whipped up a script which should reproduce my problems: 

{code}
import cassandra.cluster
import cassandra.concurrent

import string
import sys


def setup_schema(session):
print("setting up schema")

session.execute("CREATE KEYSPACE IF NOT EXISTS count_test WITH 
replication = {'class': 'SimpleStrategy', 'replication_factor': 1};")
session.set_keyspace("count_test")

session.execute("""
CREATE TABLE IF NOT EXISTS tbl (
id text,
bucket bigint,
offset int,
value double,
PRIMARY KEY ((id, bucket), offset)
)
""")


def insert_test_data(session):
# setup parameters for the inserts
ids = string.lowercase[:5]
bucket_count = 10
offset_count = 1000

print('inserting data for %s ids, %s buckets and %s offsets' % 
(len(ids), bucket_count, offset_count))

# clear the table
session.execute("TRUNCATE tbl;")

# prepare the insert
insert = session.prepare("INSERT INTO tbl (id, bucket, offset, value) 
VALUES (?, ?, ?, ?)")

# insert a CQL row for each tag, bucket and offset
inserts = [
(insert, (t, b, o, 0))
for t in ids
for b in xrange(bucket_count)
for o in xrange(offset_count)
]
_ = cassandra.concurrent.execute_concurrent(session, inserts)

return len(inserts)


if __name__ == '__main__':
contact_points = ['cas-1', 'cas-2', 'cas-3']
session = cassandra.cluster.Cluster(contact_points).connect()

try:
setup_schema(session)
inserted = insert_test_data(session)
print("inserted %s rows" % inserted)

for count in (session.execute("SELECT count(*) FROM tbl") for _ 
in range(10)):
print('queried count was %s%s' % (count[0].count, '' if 
count[0].count == inserted else ' (fail)'))
finally:
session.shutdown()
{code}

In my setup this yields (on a particular run):
{code}
setting up schema
inserting data for 5 ids, 10 buckets and 1000 offsets
inserted 5 rows
queried count was 5
queried count was 49396 (fail)
queried count was 49918 (fail)
queried count was 5
queried count was 5
queried count was 5
queried count was 49993 (fail)
queried count was 48997 (fail)
queried count was 49772 (fail)
queried count was 49551 (fail)
{code}

As you can see the counts vary. The number of failures seem to be correlated to 
the number of rows in the cluster. E.g. with only 1000 rows there are no wrong 
counts.

As for my set-up: I'm using a three node cluster (cas-1, cas-2 and cas-3) which 
run on Vagrant + LXC. I planned on writing a script using CCM to be portable, 
but I wasn't able to reproduce the results with CCM! I've tried both Cassandra 
2.1.2 and 2.1.4 with CCM. That was rather disappointing. Or looking at it 
differently ... it might be considered a clue to where things go wrong ...

Any of this ring a bell? Do you perhaps have pointers for me to dig deeper?

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>Assignee: Benjamin Lerer
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-03-17 Thread Frens Jan Rumph (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frens Jan Rumph updated CASSANDRA-8940:
---
  Environment: 2.1.2
Fix Version/s: (was: 2.1.2)

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.1.2
>Reporter: Frens Jan Rumph
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-03-11 Thread Frens Jan Rumph (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357150#comment-14357150
 ] 

Frens Jan Rumph commented on CASSANDRA-8940:


For completeness sake I also tested with CL=QUORUM and ALL which as one would 
expect with RF=1 yields the same inconsistent results.

> Inconsistent select count and select distinct
> -
>
> Key: CASSANDRA-8940
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Frens Jan Rumph
> Fix For: 2.1.2
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
> id frozen,
> bucket bigint,
> offset int,
> value double,
> PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
> tags map
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8940) Inconsistent select count and select distinct

2015-03-10 Thread Frens Jan Rumph (JIRA)
Frens Jan Rumph created CASSANDRA-8940:
--

 Summary: Inconsistent select count and select distinct
 Key: CASSANDRA-8940
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Frens Jan Rumph
 Fix For: 2.1.2


When performing {{select count( * ) from ...}} I expect the results to be 
consistent over multiple query executions if the table at hand is not written 
to / deleted from in the mean time. However, in my set-up it is not. The counts 
returned vary considerable (several percent). The same holds for {{select 
distinct partition-key-columns from ...}}.

I have a table in a keyspace with replication_factor = 1 which is something 
like:

{code}
CREATE TABLE tbl (
id frozen,
bucket bigint,
offset int,
value double,
PRIMARY KEY ((id, bucket), offset)
)
{code}

The frozen udt is:

{code}
CREATE TYPE id_type (
tags map
);
{code}

The table contains around 35k rows (I'm not trying to be funny here ...). The 
consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-7711) composite column not sliced when using IN clause on (other) composite columns

2014-08-07 Thread Frens Jan Rumph (JIRA)
Frens Jan Rumph created CASSANDRA-7711:
--

 Summary: composite column not sliced when using IN clause on 
(other) composite columns
 Key: CASSANDRA-7711
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7711
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.0.9
Reporter: Frens Jan Rumph


Hi,

I'm storing data points in cassandra keyed by a number of values and a 
timestamp. I'd want to use IN clauses to select points and sliced by time. The 
in clauses work, but I can't get it to work in combination with the slicing: 
all values are returned / the range in the where clause seems to be ignored.

A dumbed down abstract version of my layout and some sample data:

{code}
create table tbl (
  a text,
  b text,
  c int,
  d int,
  primary key ((a), b, c)
);

insert into tbl (a,b,c,d) values ('a1', 'b1', 1, 1);
insert into tbl (a,b,c,d) values ('a1', 'b1', 2, 2);
insert into tbl (a,b,c,d) values ('a1', 'b2', 1, 1);
insert into tbl (a,b,c,d) values ('a1', 'b2', 2, 2);
insert into tbl (a,b,c,d) values ('a2', 'b1', 1, 1);
insert into tbl (a,b,c,d) values ('a2', 'b1', 2, 2);
insert into tbl (a,b,c,d) values ('a3', 'b2', 1, 1);
insert into tbl (a,b,c,d) values ('a3', 'b2', 2, 2);
{code}

So the table contains:
{code}
 a  | b  | c | d
++---+---
 a1 | b1 | 1 | 1
 a1 | b1 | 2 | 2
 a1 | b2 | 1 | 1
 a1 | b2 | 2 | 2
 a2 | b1 | 1 | 1
 a2 | b1 | 2 | 2
 a3 | b2 | 1 | 1
 a3 | b2 | 2 | 2
{code}


When performing {{select * from tbl where a in ('a1', 'a2') and (b) in (('b1'), 
('b2')) and c > 1;}} I get:
{code}
 a  | b  | c | d
++---+---
 a1 | b1 | 1 | 1
 a1 | b1 | 2 | 2
 a1 | b2 | 1 | 1
 a1 | b2 | 2 | 2
 a2 | b1 | 1 | 1
 a2 | b1 | 2 | 2
{code}

But I expected:
{code}
 a  | b  | c | d
++---+---
 a1 | b1 | 2 | 2
 a1 | b2 | 2 | 2
 a2 | b1 | 2 | 2
{code}


Am I doing something wrong? Or is {{c > 1}} incorrectly ignored?


{{select * from tbl where a in ('a1', 'a2') and b='b1' and c > 1;}} does 
correctly produce:
{code}
a  | b  | c | d
++---+---
 a1 | b1 | 2 | 2
 a2 | b1 | 2 | 2
{code}

So I expect this behaviour to relate to the interworking of the IN clause on 
the clustering column b and the > predicate on column c.

Cheers,
Frens Jan



--
This message was sent by Atlassian JIRA
(v6.2#6252)