[ https://issues.apache.org/jira/browse/CASSANDRA-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-3761: ---------------------------------------- Attachment: 0004-Thrift-gen-files.patch 0003-Makes-batches-atomic.patch 0002-Add-support-for-switching-the-CQL-version.patch 0001-CQL-3.0.patch Attaching patches. I'll try to provide the rational for the changes introduces (on top of basic support for the syntaxes from create_cf_syntaxes.txt). All feedback/comments on those will be greatly appreciated, I'm not trying to impose anything, but I do think after much reflexion that those choices are the right way to move forward. First, on the backward compatibility issue: the changes proposed are breaking changes (as indicated by the major version bump). The patch creates a new java package cql3 that is completely separate from the cql package. So for a time both cql 2.0 and 3.0 will be supported. To make that work, the patch adds a new thrift method set_cql_version(String) to allow setting the version during the client session (in the second patch). The default version in the patch is 3.0 *but* it's only because it makes testing easier until drivers adds support for this new method. Based on the discussion we had on CASSANDRA-2474, what I propose is that for C* 1.1 we add only 3.0 as a beta/demonstration version with 2.0 still being the default. If everything goes well, in C* 1.2, cql 3.0 will become the default and 2.0 will be supported by deprecated. One change the patches does is to make static CF (i.e, the ones that don't use COMPACT STORAGE) really static, i.e. adding non defined columns is not supported. The reasons are numerous (I'll probably even forget some): * If we were to allow non defined columns in static CF, we would pretty much allow to use a static CF as a wide row. So we would have to support doing a slice on those static CF. But that is kind of contradictory with the introduction of the transposed idea. And if we don't allow slices on static CF, what is the point of allowing random columns? * It makes the code simpler. In particular it avoids complication with sparse composites. * It's helpful to users. If you use a static CF, we tell you when you do a typo inserting a column (I do think it's a very useful thing). * It's not a limitation. If you need a new column, you can do an ALTER ADD, it's cheap. And if your columns names are really "random", then what you really want is a transposed/compact CF anyway. * It means it makes sense to limit prepared markers to the right side of a relation. That in turns allows to do a bit more work during preparation and make stuff like CASSANDRA-3753 possible/easy. * It make the language much closer to SQL. Don't get me wrong, I don't like SQL all that much, but CQL does reuses SQL syntax and core concepts. Making it easier on all the people that know SQL is a good thing provided there is no downside to do it. And in that case I don't think there is one, outside of breaking compatibility with CQL 2.0, which will be broken anyway. Another thing the patch does is to add consistency to our handling of case-sensitivity for column names. What I mean here is that currently, when you declare: {noformat} CREATE TABLE ( MyKey text PRIMARY KEY, Column1 int, Column2 int, ) {noformat} then MyKey is case-insensitive and Column1 and Column2 are case-sensitive. We should fix that inconsistency. The patch makes the choice to reuse the way SQL (at least PostgreSQL) deal with this: all definition names (MyKey, Column1 and Column2 in my example) are case insensitive by default (they are lowercased basically) but you can specify a case sensitive one using double-quotes. The rational is that with the static is static idea above, we are in a case very similar to SQL and so I didn't see a very good reason to do things differently (again, except for the issue of backward compatibility). Note that in the wide row (transposed) case, the C* column name is *not* a 'definition name', so that rule won't apply. For consistency sake, keyspace and column family names also follow the same rule. We could alternatively make everything case-sensitive, but at least the rule should be the same for all definitions, whether it is a PRIMARY KEY or not. On the code itself: the changes described above are extensive in that they involve an almost complete rewrite of the select, createCF, update, delete and alterTable statements, as well as a non-trivial amount of changes to the grammar. While doing that, I saw a number of things that could be improved/generalized to make the code more readable (typically sometimes the code of a statement was entirely in QueryProcessor, sometimes it was in the statement class, sometimes it was split between both, etc..) and to fix a number of (minor) issues. And because we have decided that cql 3 is a 'fork' of cql 2, I decided it would be a good occasion to improve the code. One thing leading to another, the patch refactors a good chunk of the cql code. I know that it's not the way we usually do things but I think the circumstance are a bit difference this time in that the patch is (almost) only new code, and we've agreed cql3 will be beta to start with. I'm also willing to devote a fair chunk of my time to the testing of that new cql version. I hope you guys won't be pissed off by this. In any case, the patches fix the following issues with cql (in the new cql3 that is, it does not backport the fixes to cql 2.0): * Handles correctly the case where an update inside a batch override the current keyspace * Support overriding the current keyspace in all statements that apply to a CF. I.e, add the optional override to CreateCF, CreateIndex, atlterTable and dropCF. * make counters work with prepared statements (the grammar don't allow markers like 'X = X - ?', and even if it was, the code had to be updated to handle it correctly). * Delete's consistency level isn't validated * Correctly batch batches. I.e, create only one RowMutation for a given (keyspace, key) for the whole batch statement. That fix is in a separated patch. Last note: the patch does not make any attempt to support transparently super columns with the new notations. I think trying to do so would be messy, so I think it will be a better use of our time to internally transform super columns into composites (aka CASSANDRA-3237). > CQL 3.0 > ------- > > Key: CASSANDRA-3761 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3761 > Project: Cassandra > Issue Type: New Feature > Components: API > Reporter: Sylvain Lebresne > Labels: cql > Fix For: 1.1 > > Attachments: 0001-CQL-3.0.patch, > 0002-Add-support-for-switching-the-CQL-version.patch, > 0003-Makes-batches-atomic.patch, 0004-Thrift-gen-files.patch, > create_cf_syntaxes.txt > > > This ticket is a reformulation/generalization of CASSANDRA-2474. The core > change of CQL 3.0 is to introduce the new syntaxes that were discussed in > CASSANDRA-2474 that allow to: > # Provide a better/more native support for wide rows, using the idea of > transposed vie. > # The generalization to composite columns. > The attached text file create_cf_syntaxes.txt recall the new syntaxes > introduced. > The changes proposed above allow (and strongly suggest in some cases) a > number of other changes to the language that this ticket proposes to > explore/implement (more details coming in the comments). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira