Bill Mitchell created CASSANDRA-7099:
----------------------------------------

             Summary: Concurrent instances of same Prepared Statement seeing 
intermingled result sets
                 Key: CASSANDRA-7099
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: Cassandra 2.0.7 with single node cluster
Windows dual-core laptop
DataStax Java driver 2.0.1
            Reporter: Bill Mitchell


I have a schema in which a wide row is partitioned into smaller rows.  (See 
CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this case, 
I randomly assigned the rows across the partitions based on the first four hex 
digits of a hash value modulo the number of partitions.  

Occasionally I need to retrieve the rows in order of insertion irrespective of 
the partitioning.  Cassandra, of course, does not support this when paging by 
fetch size is enabled, so I am issuing a query against each of the partitions 
to obtain their rows in order, and merging the results:

SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
ORDER BY cd ASC, ec ASC ALLOW FILTERING;

These parallel queries are all instances of a single PreparedStatement.  

What I saw was identical values from multiple queries, which by construction 
should never happen, and after further investigation, discovered that rows from 
partition 5 are being returned in the result set for the query against another 
partition, e.g., 1.  This was so unbelievable that I added diagnostic code in 
my test case to detect this:

After reading 167 rows, returned partition 5 does not match query partition 4

The merge logic works fine and delivers correct results when I use LIMIT to 
avoid fetch size paging.  Even if there were a bug there, it is hard to see how 
any client error explains ResultSet.one() returning a row whose values don't 
match the constraints in that ResultSet's query.

I'm not sure of the exact significance of 167, as I have configured the 
queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
by the number of partitions, 7, so the fetchSize for each of these parallel 
queries was set to 142.  I suspect this is being treated as a minimum 
fetchSize, and the driver or server is rounding this up to fill a transmission 
block.  When I prime the pump, issuing the query against each of the 
partitions, the initial contents of the result sets are correct.  The failure 
appears after we advance two of these queries to the next page.

Although I had been experimenting with fetchMoreResults() for prefetching, I 
disabled that to isolate this problem, so that is not a factor.   

I have not yet tried preparing separate instances of the query, as I already 
have common logic to cache and reuse already prepared statements.

I have not proven that it is a server bug and not a Java driver bug, but on 
first glance it was not obvious how the Java driver might associate the 
responses with the wrong requests.  Were that happening, one would expect to 
see the right overall collection of rows, just to the wrong queries, and not 
duplicates, which is what I saw.    



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to