[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns and wide rows

Sylvain Lebresne (Commented) (JIRA) Fri, 13 Jan 2012 01:46:57 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185514#comment-13185514
 ]


Sylvain Lebresne commented on CASSANDRA-2474:
---------------------------------------------

bq. Personally I'd rather support both for one release to make the transition 
easier, but with neither super nor composite support I doubt many people are 
using the current .., so if doing both adds a lot of complexity I'm okay with 
this.

The patch changes enough the implementation of select that keeping support for 
'..' would amount to add back some special code to handle it. But I guess 
removing it right away may mean a rather painful upgrade for anyone using CQL 
in production right now, so maybe it's worth it. Once the patch is ready, I'll 
see what adding back the '..' for easing transition entails exactly.

bq. I've made static definitions (i.e, those definitions that don't use COMPACT 
STORAGE basically) really static.

To try to justify this a little bit more (so it doesn't seem too random a 
choice), I see mainly two big advantages to doing that:
# added validation/security for the programmer: If you define:
{noformat}
CREATE TABLE Users (
    ID int PRIMARY KEY,
    NAME text,
    EMAIL text)
{noformat}
I think it's great that the DB warns you that
{noformat}
INSERT INTO users (ID, NAME, EMA1L) VALUES (2, "Jacques", "j...@cques.com")
{noformat}
or
{noformat}
SELECT * FROM users WHERE EMA1L = "j...@cques.com"
{noformat}
are likely mistakes on your side. It's also what someone coming from SQL would 
expect :P
# It adds some (imo reassuring) regularity to the language, in that in
{noformat}
SELECT xxx, yyy FROM cf WHERE zzz > 3;
{noformat}
we know that xxx, yyy and zzz are *always* names defined in the "schema" 
(schema meaning here the CREATE TABLE definition). If we allow something 
random, it will only be meaningful for static (and sparse) CF and we will have 
to deal with the conflict with other column definition (parts of the PRIMARY 
KEY typically). Typically, in my example above, it means we would allow random 
column names to be insert except for the column name ID.

And I don't see any downside since you can cheaply update the schema or use 
wide rows if appropriate. Yes, internally our engine would allow for insert 
non-predefined column for 'static' CF, but is that useful is the right 
question. Or, as a great man once said: "schemaless" is a non-feature; 
"painless schema" is what people care about.

bq. Granted, it doesn't make a great deal of sense to use IN + LIMIT, but if 
someone does, the LIMIT should take precedence

What I meant is I'm not sure how to implement it. Suppose you have the 
following wide row definition (good ol' time series):
{noformat}
CREATE TABLE Events (
    event_type text,
    time date,
    event_details binay,
    PRIMARY KEY (event_type, time)
) USING COMPACT STORAGE
{noformat}
and say for two event_type e1 and e2, you have 1000 events each. Now if you do 
(with limit as a way to do paging):
{noformat}
SELECT * FROM Events WHERE event_type IN (e1, e2) LIMIT 500;
{noformat}
How does that translate internally? If we do a multiGetSlice with a slice 
having a limit of 500, we'll read 500 columns from e2 uselessly. And we have a 
similar problem if we do more simply:
{noformat}
SELECT * FROM Events LIMIT 1000
{noformat}
because we currently have no way to do a range query that stops when we have n 
columns *across* all rows. In a way it's a simpler problem that in the 'IN' 
case because we could add internal support for this, but it's additional work 
and not really in the scope of this ticket.

In other words, I'm not sure how to implement LIMIT currently with the new 
definitions introduced by this patch  while keeping it's SQL semantic.

bq. What if we allowed "ORDER BY DESC" instead?

I'd be fine with that (though wouldn't "ORDER DESC" sound less weird?).

bq. BTW, why test this with dtest instead of just single node mode?

No reason outside of it being simpler for me (the tests only use a single node) 
and my ignorance of an "official" CQL test suite (but I kind of think the dtest 
framework would be a good official test framework for anything not a unit test).

                
> CQL support for compound columns and wide rows
> ----------------------------------------------
>
>                 Key: CASSANDRA-2474
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Eric Evans
>            Assignee: Sylvain Lebresne
>            Priority: Critical
>              Labels: cql
>             Fix For: 1.1
>
>         Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, 
> cql_tests.py, raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns and wide rows

Reply via email to