[jira] [Updated] (CASSANDRA-3761) CQL 3.0

Sylvain Lebresne (Updated) (JIRA) Fri, 20 Jan 2012 09:51:04 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-3761:
----------------------------------------

    Attachment: 0004-Thrift-gen-files.patch
                0003-Makes-batches-atomic.patch
                0002-Add-support-for-switching-the-CQL-version.patch
                0001-CQL-3.0.patch

Attaching patches. I'll try to provide the rational for the changes introduces 
(on top of basic support for the syntaxes from create_cf_syntaxes.txt). All 
feedback/comments on those will be greatly appreciated, I'm not trying to 
impose anything, but I do think after much reflexion that those choices are the 
right way to move forward.

First, on the backward compatibility issue: the changes proposed are breaking 
changes (as indicated by the major version bump). The patch creates a new java 
package cql3 that is completely separate from the cql package. So for a time 
both cql 2.0 and 3.0 will be supported. To make that work, the patch adds a new 
thrift method set_cql_version(String) to allow setting the version during the 
client session (in the second patch). The default version in the patch is 3.0 
*but* it's only because it makes testing easier until drivers adds support for 
this new method. Based on the discussion we had on CASSANDRA-2474, what I 
propose is that for C* 1.1 we add only 3.0 as a beta/demonstration version with 
2.0 still being the default. If everything goes well, in C* 1.2, cql 3.0 will 
become the default and 2.0 will be supported by deprecated.

One change the patches does is to make static CF (i.e, the ones that don't use 
COMPACT STORAGE) really static, i.e. adding non defined columns is not 
supported. The reasons are numerous (I'll probably even forget some):
* If we were to allow non defined columns in static CF, we would pretty much 
allow to use a static CF as a wide row. So we would have to support doing a 
slice on those static CF. But that is kind of contradictory with the 
introduction of the transposed idea. And if we don't allow slices on static CF, 
what is the point of allowing random columns?
* It makes the code simpler. In particular it avoids complication with sparse 
composites.
* It's helpful to users. If you use a static CF, we tell you when you do a typo 
inserting a column (I do think it's a very useful thing).
* It's not a limitation. If you need a new column, you can do an ALTER ADD, 
it's cheap. And if your columns names are really "random", then what you really 
want is a transposed/compact CF anyway.
* It means it makes sense to limit prepared markers to the right side of a 
relation. That in turns allows to do a bit more work during preparation and 
make stuff like CASSANDRA-3753 possible/easy.
* It make the language much closer to SQL. Don't get me wrong, I don't like SQL 
all that much, but CQL does reuses SQL syntax and core concepts. Making it 
easier on all the people that know SQL is a good thing provided there is no 
downside to do it. And in that case I don't think there is one, outside of 
breaking compatibility with CQL 2.0, which will be broken anyway.

Another thing the patch does is to add consistency to our handling of 
case-sensitivity for column names. What I mean here is that currently, when you 
declare:
{noformat}
CREATE TABLE (
    MyKey text PRIMARY KEY,
    Column1 int,
    Column2 int,
)
{noformat}
then MyKey is case-insensitive and Column1 and Column2 are case-sensitive. We 
should fix that inconsistency. The patch makes the choice to reuse the way SQL 
(at least PostgreSQL) deal with this: all definition names (MyKey, Column1 and 
Column2 in my example) are case insensitive by default (they are lowercased 
basically) but you can specify a case sensitive one using double-quotes. The 
rational is that with the static is static idea above, we are in a case very 
similar to SQL and so I didn't see a very good reason to do things differently 
(again, except for the issue of backward compatibility). Note that in the wide 
row (transposed) case, the C* column name is *not* a 'definition name', so that 
rule won't apply. For consistency sake, keyspace and column family names also 
follow the same rule.

We could alternatively make everything case-sensitive, but at least the rule 
should be the same for all definitions, whether it is a PRIMARY KEY or not.

On the code itself: the changes described above are extensive in that they 
involve an almost complete rewrite of the select, createCF, update, delete and 
alterTable statements, as well as a non-trivial amount of changes to the 
grammar. While doing that, I saw a number of things that could be 
improved/generalized to make the code more readable (typically sometimes the 
code of a statement was entirely in QueryProcessor, sometimes it was in the 
statement class, sometimes it was split between both, etc..) and to fix a 
number of (minor) issues. And because we have decided that cql 3 is a 'fork' of 
cql 2, I decided it would be a good occasion to improve the code. One thing 
leading to another, the patch refactors a good chunk of the cql code. I know 
that it's not the way we usually do things but I think the circumstance are a 
bit difference this time in that the patch is (almost) only new code, and we've 
agreed cql3 will be beta to start with. I'm also willing to devote a fair chunk 
of my time to the testing of that new cql version. I hope you guys won't be 
pissed off by this. In any case, the patches fix the following issues with cql 
(in the new cql3 that is, it does not backport the fixes to cql 2.0):
* Handles correctly the case where an update inside a batch override the 
current keyspace
* Support overriding the current keyspace in all statements that apply to a CF. 
I.e, add the optional override to CreateCF, CreateIndex, atlterTable and dropCF.
* make counters work with prepared statements (the grammar don't allow markers 
like 'X = X - ?', and even if it was, the code had to be updated to handle it 
correctly).
* Delete's consistency level isn't validated
* Correctly batch batches. I.e, create only one RowMutation for a given 
(keyspace, key) for the whole batch statement. That fix is in a separated patch.

Last note: the patch does not make any attempt to support transparently super 
columns with the new notations. I think trying to do so would be messy, so I 
think it will be a better use of our time to internally transform super columns 
into composites (aka CASSANDRA-3237).

                
> CQL 3.0
> -------
>
>                 Key: CASSANDRA-3761
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3761
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Sylvain Lebresne
>              Labels: cql
>             Fix For: 1.1
>
>         Attachments: 0001-CQL-3.0.patch, 
> 0002-Add-support-for-switching-the-CQL-version.patch, 
> 0003-Makes-batches-atomic.patch, 0004-Thrift-gen-files.patch, 
> create_cf_syntaxes.txt
>
>
> This ticket is a reformulation/generalization of CASSANDRA-2474. The core 
> change of CQL 3.0 is to introduce the new syntaxes that were discussed in 
> CASSANDRA-2474 that allow to:
> # Provide a better/more native support for wide rows, using the idea of 
> transposed vie.
> # The generalization to composite columns.
> The attached text file create_cf_syntaxes.txt recall the new syntaxes 
> introduced.
> The changes proposed above allow (and strongly suggest in some cases) a 
> number of other changes to the language that this ticket proposes to 
> explore/implement (more details coming in the comments).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3761) CQL 3.0

Reply via email to