Re: CQL support for compound columns

Sylvain Lebresne Tue, 03 Jan 2012 10:09:05 -0800

Ok, I think I'm warming up to what we're getting at. I would change
thesyntax of the VALUE() thing however. Instead of:CREATE TABLE
timeline (   userid int,   posted_at uuid,   body string,   PRIMARY
KEY(user_id, posted_at),   VALUE(body))I would prefer:CREATE COMPACT
TABLE timeline (   userid int,   posted_at uuid,   body string,
PRIMARY KEY(user_id, posted_at),)
The reasons being that it really influences the implementation layout
of theCF in C*. Namely, the non-compact CF defined by CREATE TABLE
timeline (   userid int,   posted_at uuid,   body string,   PRIMARY
KEY(user_id, posted_at),)would look in C* like:<userid> : {
<posted_at>:'body' -> <value>}while the COMPACT variant would
be:<userid> : {    <posted_at> -> <value>}which is using the fact that
there is only 1 field not part of the key to"optimize" the layout. And
I believe making the COMPACT part of the CREATEemphasis better that
it's a property of the definition itself (that cannot bechanged)
rather than of that specific 'body' field. It also make the rule
forCOMPACT table rather simple: "a compact table should have only one
field notpart of the primary key"; you don't have to deal with errors
like someonedefining two VALUE() for instance.

That being said, I'd like to try to resume where we're at (including
theCOMPACT change above) and add a few random ideas along the way.
Please correctme if I've got something wrong.
I think we have 4 different cases, 2 for 'standard' CF without
composites:- static CFs (the only case CQL handle really well today)-
dynamic CFs (wide rows, time series if you prefer)and 2 for CF with
composite column names:- 'dense' composite (typically time series but
where the key is naturally  multi-parts)- 'sparse' composite (aka
super columns)
Let me try to take an example for which, with how it would
translateinternally and example queries.

Standard "static" CF--------------------
"For each user, holds his infos"
CREATE TABLE users (    userid uuid PRIMARY KEY,    firstname text,
lastname text,    age int)
In C*:<userid> : {    'firstname' -> <value>    'lastname' -> <value>
  'age' -> <value>}
Query:SELECT firstname, lastname FROM users WHERE userid = '...';
Standard "dynamic" CF---------------------
"For each user, keep each url he clicked on with the date of last click"
CREATE COMPACT TABLE clicks (    userid uuid,    url text,
timestamp date    PRIMARY KEY (userid, url))
In C*:<userid> : {    <url> -> <timestamp>}
Query:SELECT url, timestamp FROM clicks WHERE userid = '...';SELECT
timestamp FROM clicks WHERE userid = '...' and url = 'http://...';
'dense' composite-----------------
"For each user, keep ip and port from where he connected with the date
of lastconnection"
CREATE COMPACT TABLE connections (    userid uuid,    ip binary,
port int,    timestamp date    PRIMARY KEY (userid, ip, port))
In C*:<userid> : {    <ip>:<port> -> <timestamp>}
Query:SELECT ip, port, timestamp FROM connections WHERE userid = '...';
'sparse' composite------------------
"User timeline"
CREATE TABLE timeline (    userid uuid,    posted_at date,    body
text,    posted_by text,    PRIMARY KEY (user_id, posted_at),);
In C*:<userid> : {    <posted_at>:'body' -> <value>
<posted_at>:'posted_by' -> <value>}
Query:SELECT body, posted_by FROM timeline WHERE userid = '...' and
posted_at = '2 janvier 2010'
Note: I think we really should also be able to do queries like:SELECT
posted_ad, body, posted_by FROM timeline WHERE userid = '...' and
posted_at > '2 janvier 2010'but that's more akin to the modification
of the syntax for slices.

Random other ideas------------------
1) We could allow something like:    CONSTRAINT key PRIMARY KEY
(userid, ip, port)  which would then allow to write    SELECT
timestamp FROM users WHERE key = ('...', 192.168.0.1, 80);  (I believe
this is the 'standard' notation to name a 'composite' key in SQL)
- Above we're ony handling the use of composites for column names, but
they  can be useful for value (and row keys) and it could be nice to
have an easy  notation for that (clearly a following ticket however).
What about:
CREATE COMPACT TABLE timeline (    userid_part1 text,    userid_part2
int,    posted_at date,    posted_by uuid,    body text    header text
   GROUP (userid_part1, userid_part2) AS userid,    PRIMARY KEY
(userid, posted_at, posted_by)    GROUP (header, body))
In C*:<userid_part1>:<userid_part2> : {    <posted_at>:<posted_by> ->
<header>:<body>}
Query:SELECT posted_at, posted_by, body, header FROM timeline WHERE
userid = ('john', 32)

--
Sylvain
On Mon, Jan 2, 2012 at 8:29 PM, Eric Evans <eev...@acunu.com> wrote:
> On Mon, Jan 2, 2012 at 12:55 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> On Mon, Jan 2, 2012 at 10:53 AM, Eric Evans <eev...@acunu.com> wrote:
>>> In SQL, PRIMARY KEY is a modifier to a column spec, and here PRIMARY
>>> KEY(user_id, posted_at, posted_by) reads like a PRIMARY modifier
>>> applied to a KEY() function.  It's also a little strange the way it
>>> appears in the grouping of column specs, when it's actually defining a
>>> grouping or relationship of them (maybe this is what you meant about
>>> using TRANSPOSED WITH <options> to emphasize the non-standard).
>>
>> Fear not, I can set your mind at ease. :)
>>
>> Personally I think the syntax works reasonably well in its own right,
>> but my main reason for the proposed syntax is that it is actually
>> standard SQL for composite primary keys at least as far back as SQL
>> 92, as a subcategory of table constraints.  The SQL standard is not
>> freely linkable, but see
>> http://www.postgresql.org/docs/9.1/static/sql-createtable.html for a
>> real-world example.
>
> OK, I stand corrected (and my mind is at ease :) ).
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu

Re: CQL support for compound columns

Reply via email to