[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956721#comment-13956721 ] Tyler Hobbs commented on CASSANDRA-6825: [~slebresne] the logic is primarily broken because it continues checking latter components after it knows that the first component intersects. For example, suppose you have a slice of {{((1, 1), "")}}, min column names of {{(0, 2)}}, and max column names of {{(2, 3)}}. The first component of the slice start falls within the min/max range; the second component does not. Although the slice is _starting_ outside of the min/max range for the second component, it should be considered intersecting because we'll accept other values for the second component (for higher values of the first component). The current logic sees that the second component doesn't fall within min/max and considers it non-intersecting. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Fix For: 2.0.7, 2.1 beta2 > > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956669#comment-13956669 ] Sylvain Lebresne commented on CASSANDRA-6825: - [~thobbs] Any insights on what it off in the logic exactly? > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Fix For: 2.0.7, 2.1 beta2 > > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951437#comment-13951437 ] Tyler Hobbs commented on CASSANDRA-6825: It loos like CASSANDRA-6327 is the cause for this. The logic for testing sstables for inclusion when there's a composite comparator and multiple components in the slice filter is off. This showed up for {{count(\*)}} because counting queries are always paged internally; the second page was erroneously skipping an sstable. If the {{select *}} query has the same page size (10k), it will also omit results. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951200#comment-13951200 ] Tyler Hobbs commented on CASSANDRA-6825: [~wtmitchell3] thanks! I can reproduce the issue now, so I should be able to track down what's going on. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943586#comment-13943586 ] Bill Mitchell commented on CASSANDRA-6825: -- As it happens, I have that info handy as my JUnit testcase includes it in the log4j output: CREATE TABLE testdb_1395374703023.sr ( siteid text, listid bigint, partition int, createdate timestamp, emailcrypt text, emailaddr text, properties text, removedate timestamp, PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt) ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt ASC) AND read_repair_chance = 0.1 AND dclocal_read_repair_chance = 0.0 AND replicate_on_write = true AND gc_grace_seconds = 864000 AND bloom_filter_fp_chance = 0.01 AND caching = 'KEYS_ONLY' AND comment = '' AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' } AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor' }; (siteID was a BIGINT until recently when the schema was changed to TEXT to match the use of siteID elsewhere in the product. I had not thought to represent our Java String as a Cassandra UUID.) > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943492#comment-13943492 ] Tyler Hobbs commented on CASSANDRA-6825: [~wtmitchell3] what type is the siteid column supposed to be? So far I've tried varint, uuid, and text and had problems with each. Just pasting "DESCRIBE KEYSPACE testdb_" would also work. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942881#comment-13942881 ] Bill Mitchell commented on CASSANDRA-6825: -- Tyler, you use an interesting word, "flush". After running a test with a different database name, I went back and looked at the first keyspace, as I did not drain the node before zipping the file the first time. A third SSTable had now been written. See the larger .zip file I have attached. When I try the same statements through cqlsh, a SELECT * FROM sr WHERE ... AND partition = 2 now shows 2 rows, but SELECT COUNT(*) FROM sr WHERE ... AND partition=2 still returns a count of 1. So the count is still incorrect. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942757#comment-13942757 ] Bill Mitchell commented on CASSANDRA-6825: -- I've attached a testdb_1395372407904.zip of the data/testdb_1395372407904 directory after the test ran. After the test completed, I did select * from sr and it returned 10 rows: cqlsh:testdb_1395372407904> select count(*) from sr limit 10; count 10 (1 rows) When I did a select count(*) for each of the six partitions, they total only 9: cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB 2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 0 LIMIT 10; count --- 2 (1 rows) cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB 2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 1 LIMIT 10; count --- 2 (1 rows) cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB 2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 2 LIMIT 10; count --- 1 (1 rows) cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB 2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 3 LIMIT 10; count --- 1 (1 rows) cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB 2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 4 LIMIT 10; count --- 1 (1 rows) cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB 2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 5 LIMIT 10; count --- 2 (1 rows) As it turns out, the 1 rows not counted were all from partition=2, and have a createDate identical except in the milliseconds to 1 rows that do appear. The common key values of the presumably uncounted rows (as they are the rows that did not return on the SELECT query, CASSANDRA-6826) are siteID=4CA4F79E-3AB2-41C5-AE42-C7009736F1D5,listID=24,partition=2,createDate=2014-03-20T22:27:26.457-0500. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, > selectrowcounts.txt, testdb_1395372407904.zip > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942500#comment-13942500 ] Tyler Hobbs commented on CASSANDRA-6825: Scratch that, I made a mistake in my test case (facepalm). After fixing that, I'm not able to reproduce. [~billmichell] A zip of your sstables could be useful if you can still provide that. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942437#comment-13942437 ] Tyler Hobbs commented on CASSANDRA-6825: The overcounting problem seems to be limited to overwrites that end up in different SSTables (or a memtable). If you write once, flush, and then overwrite, your count will be exactly 2x. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Tyler Hobbs > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942232#comment-13942232 ] Bill Mitchell commented on CASSANDRA-6825: -- I can confirm the problem is still there in 2.0.6. As I was verifying that I could still reproduce CASSANDRA-6826, I checked for the COUNT(*) issue too. In one of the tables six partitions, a COUNT(*) reported 1 rows, but if I did a SELECT * in either ascending or descending order, cqlsh printed 2 rows. Would it help if I zipped up the data directory containing the table after the problem appeared? Or would you need other information from the system directory, too, to see how the data is recorded? That might help in isolating how the problem arises. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Russ Hatch > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942193#comment-13942193 ] Russ Hatch commented on CASSANDRA-6825: --- Unfortunately I was not able to reproduce this issue. I tried cassandra 2.0.5 on linux, and also on Win7 (tried with the python driver, and cqlsh for checking the counts). I was using java 1.7.0_51. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Russ Hatch > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926446#comment-13926446 ] Bill Mitchell commented on CASSANDRA-6825: -- After shortening the column names, the schema is: CREATE TABLE sr (s bigint, l bigint, partition int, cd timestamp, ec text, ea text, properties text, rd timestamp, PRIMARY KEY ((s, l, p), cd, ec)) WITH CLUSTERING ORDER BY (cd DESC, ec ASC). > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Russ Hatch > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926090#comment-13926090 ] Russ Hatch commented on CASSANDRA-6825: --- [~wtmitchell3] -- I'm working to reproduce this issue. To get as close to the mark can you provide me with a full schema for the table? (Mainly I'm interested in which columns are part of the primary key -- aside from the siteid, listid, and partition). > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Russ Hatch > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924860#comment-13924860 ] Bill Mitchell commented on CASSANDRA-6825: -- If it helps in reproducing it, unlike my earlier report in CASSANDRA-6736, this failure and that of CASSANDRA-6826 appear in a small volume test, less than 100,000 rows total. This lower number was being run in a JUnit test as part of a maven build of a complete product, such that the test keyspace and tables were created, but the row insertion did not begin until 9 minutes later. So Cassandra is not noting these as high-volume activity, and the row width is not large enough to provoke incremental compaction, or in fact any compaction whatsoever. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell >Assignee: Russ Hatch > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924480#comment-13924480 ] Bill Mitchell commented on CASSANDRA-6825: -- Yes. I've added that to the environment description. > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64, single node cluster > Cassandra 2.0.5 >Reporter: Bill Mitchell > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
[ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924454#comment-13924454 ] Jonathan Ellis commented on CASSANDRA-6825: --- Is this a single-node cluster? > COUNT(*) with WHERE not finding all the matching rows > - > > Key: CASSANDRA-6825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6825 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: quad core Windows7 x64 > Cassandra 2.0.5 >Reporter: Bill Mitchell > Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt > > > Investigating another problem, I needed to do COUNT(*) on the several > partitions of a table immediately after a test case ran, and I discovered > that count(*) on the full table and on each of the partitions returned > different counts. > In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the > expected count from the test 9 rows. The composite primary key splits > the logical row into six distinct partitions, and when I issue a query asking > for the total across all six partitions, the returned result is only 83999. > Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND > partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical > WHERE predicate reports only 14,000. > This is failing immediately after running a single small test, such that > there are only two SSTables, sr-jb-1 and sr-jb-2. Compaction never needed to > run. > In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect > count(*) results. -- This message was sent by Atlassian JIRA (v6.2#6252)