[jira] [Commented] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

Sylvain Lebresne (JIRA) Wed, 30 Apr 2014 07:54:24 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985587#comment-13985587
 ]


Sylvain Lebresne commented on CASSANDRA-6875:
---------------------------------------------

The overall principle looks good, but I feel this could use some more comments 
and/or made a little more clear. Mainly, the {{isMultiColumn}} path in 
{{SelectStatement.buildBound}} looks weird at face value: we're inside a loop 
that populates a {{ColumnNameBuilder}}, but the {{isMultiColumn}} path 
completely ignores both the iterated object and the builder. This work because 
it relies on the fact that when there is a multi-column restriction then it's 
the only one restriction (which is duplicated in the 
{{SelectStatement.columnRestrictions}} array), and that the value from such 
restriction is not a single column value but rather a fully built composite 
serialized value (which is not self evident from the method naming in 
particular).  But it's hard to piece it all that together when you look at 
{{buildBound}} currently.  Some comments would help, but I'd prefer going even 
further and move the {{isMultiColumn}} path outside of the loop (by looking up 
first if the first restriction in {{columnRestrictions}} is a multi-column one 
or not) since it has no reason to be in the loop. In fact, I'd go a tad further 
by making SelectStatement abstract and have 2 subClass, one for single-column 
restrictions with a {{SingleColumnRelation[] columnRestrictions}} array field 
as we have now, and one for multi-column that has just one non-array 
{{MultiColumnRestriction columnsRestriction}} field. After all, both cases 
exclude one another in the current implementation.

Somewhat related, I'm slightly afraid that the parts about multi-column 
restrictions returning fully serialized composites (through Tuples.Value.get()) 
will not play nice with the 2.1 code, where we don't manipulate composites as 
opaque ByteBuffers anymore (concretely, Tuples will serialize the composite, 
but SelectStatement will have to deserialize it back right away to get a 
Composite, which will be both ugly and inefficient).  So to avoid having to 
change everything on merge, I think it would be cleaner to make Tuples.Value 
return a list of (individual column) values instead of just one, and let 
SelectStatement build back the full composite name using a ColumnNameBuilder.  
Especially if you make the per-column and multi-column paths a tad more 
separated as suggested above, I suspect it might clarify things a bit.  

Other than that, a bunch of more minor comments and nits:
* The {{SelectStatement.RawStatement.prepare()}} re-org patch breaks proper 
indentation at places (for instance, the indentation of parameters to 'new 
ColumnSpecification' in the first branch of udpateSingleColumnRestriction, 
though there is a few other places). Would be nice to fix those.
* Can't we use {{QueryProcessor.instance.getPrepared}} instead of creating a 
only-for-test {{QueryProcessor.staticGetPrepared}}? Or at the very least leave 
such shortcuts in the tests where it belongs.
* In Tuples.Literal.prepare, I'd prefer good ol'fashion indexed loop to iterate 
over 2 lists (feels clearer, and saves the allocator creation as a bonus).
* In Tuples.Raw.makeInReceiver should probably be called makeReceiver (it's not 
related to "IN"). I'd also drop the spaces in the string generated (if only for 
consistency with the string generated in INRaw). As a side node, 
Raw.makeReceiver uses indexed iteration while INRaw.makeInReceiver don't, can't 
we make both consistent style wise for OCD sakes?
* Why make methods of CQLStatement abstract (it's an interface)?  Also, I'd 
rather add the QueryOptions parameter to the existing executeInternal and 
default to QueryOptions.DEFAULT when calling it, rather than having 2 methods. 
Though tbh, my preference would be to move the tests to dtest and leave those 
somewhat unrelated changes to another ticket, see below.
* SingleColumnRelation.previousInTuple is now unused but not removed.
* We could save one list allocation (instead of both toCreate and toUpdate) in 
SelectStatement.updateRestrictionsForRelation (for EQ and IN, we know it can 
only be a create, and for slices we can lookup with getExisting).
* In Restriction, the {{values}} method is already at top-level, no reason to 
re-declare it for EQ.
* It bothers me to start adding unit tests for CQL queries when all of our CQL 
tests are currently in dtest. I'd *much* rather keep it all in the dtests to 
avoid the confusion on where is what tested.


> CQL3: select multiple CQL rows in a single partition using IN
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-6875
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Nicolas Favre-Felix
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 2.0.8
>
>
> In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is 
> important to support reading several distinct CQL rows from a given partition 
> using a distinct set of "coordinates" for these rows within the partition.
> CASSANDRA-4851 introduced a range scan over the multi-dimensional space of 
> clustering keys. We also need to support a "multi-get" of CQL rows, 
> potentially using the "IN" keyword to define a set of clustering keys to 
> fetch at once.
> (reusing the same example\:)
> Consider the following table:
> {code}
> CREATE TABLE test (
>   k int,
>   c1 int,
>   c2 int,
>   PRIMARY KEY (k, c1, c2)
> );
> {code}
> with the following data:
> {code}
>  k | c1 | c2
> ---+----+----
>  0 |  0 |  0
>  0 |  0 |  1
>  0 |  1 |  0
>  0 |  1 |  1
> {code}
> We can fetch a single row or a range of rows, but not a set of them:
> {code}
> > SELECT * FROM test WHERE k = 0 AND (c1, c2) IN ((0, 0), (1,1)) ;
> Bad Request: line 1:54 missing EOF at ','
> {code}
> Supporting this syntax would return:
> {code}
>  k | c1 | c2
> ---+----+----
>  0 |  0 |  0
>  0 |  1 |  1
> {code}
> Being able to fetch these two CQL rows in a single read is important to 
> maintain partition-level isolation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

Reply via email to