Ok, I think I'm warming up to what we're getting at. I would change thesyntax of the VALUE() thing however. Instead of:CREATE TABLE timeline ( userid int, posted_at uuid, body string, PRIMARY KEY(user_id, posted_at), VALUE(body))I would prefer:CREATE COMPACT TABLE timeline ( userid int, posted_at uuid, body string, PRIMARY KEY(user_id, posted_at),) The reasons being that it really influences the implementation layout of theCF in C*. Namely, the non-compact CF defined by CREATE TABLE timeline ( userid int, posted_at uuid, body string, PRIMARY KEY(user_id, posted_at),)would look in C* like:<userid> : { <posted_at>:'body' -> <value>}while the COMPACT variant would be:<userid> : { <posted_at> -> <value>}which is using the fact that there is only 1 field not part of the key to"optimize" the layout. And I believe making the COMPACT part of the CREATEemphasis better that it's a property of the definition itself (that cannot bechanged) rather than of that specific 'body' field. It also make the rule forCOMPACT table rather simple: "a compact table should have only one field notpart of the primary key"; you don't have to deal with errors like someonedefining two VALUE() for instance.
That being said, I'd like to try to resume where we're at (including theCOMPACT change above) and add a few random ideas along the way. Please correctme if I've got something wrong. I think we have 4 different cases, 2 for 'standard' CF without composites:- static CFs (the only case CQL handle really well today)- dynamic CFs (wide rows, time series if you prefer)and 2 for CF with composite column names:- 'dense' composite (typically time series but where the key is naturally multi-parts)- 'sparse' composite (aka super columns) Let me try to take an example for which, with how it would translateinternally and example queries. Standard "static" CF-------------------- "For each user, holds his infos" CREATE TABLE users ( userid uuid PRIMARY KEY, firstname text, lastname text, age int) In C*:<userid> : { 'firstname' -> <value> 'lastname' -> <value> 'age' -> <value>} Query:SELECT firstname, lastname FROM users WHERE userid = '...'; Standard "dynamic" CF--------------------- "For each user, keep each url he clicked on with the date of last click" CREATE COMPACT TABLE clicks ( userid uuid, url text, timestamp date PRIMARY KEY (userid, url)) In C*:<userid> : { <url> -> <timestamp>} Query:SELECT url, timestamp FROM clicks WHERE userid = '...';SELECT timestamp FROM clicks WHERE userid = '...' and url = 'http://...'; 'dense' composite----------------- "For each user, keep ip and port from where he connected with the date of lastconnection" CREATE COMPACT TABLE connections ( userid uuid, ip binary, port int, timestamp date PRIMARY KEY (userid, ip, port)) In C*:<userid> : { <ip>:<port> -> <timestamp>} Query:SELECT ip, port, timestamp FROM connections WHERE userid = '...'; 'sparse' composite------------------ "User timeline" CREATE TABLE timeline ( userid uuid, posted_at date, body text, posted_by text, PRIMARY KEY (user_id, posted_at),); In C*:<userid> : { <posted_at>:'body' -> <value> <posted_at>:'posted_by' -> <value>} Query:SELECT body, posted_by FROM timeline WHERE userid = '...' and posted_at = '2 janvier 2010' Note: I think we really should also be able to do queries like:SELECT posted_ad, body, posted_by FROM timeline WHERE userid = '...' and posted_at > '2 janvier 2010'but that's more akin to the modification of the syntax for slices. Random other ideas------------------ 1) We could allow something like: CONSTRAINT key PRIMARY KEY (userid, ip, port) which would then allow to write SELECT timestamp FROM users WHERE key = ('...', 192.168.0.1, 80); (I believe this is the 'standard' notation to name a 'composite' key in SQL) - Above we're ony handling the use of composites for column names, but they can be useful for value (and row keys) and it could be nice to have an easy notation for that (clearly a following ticket however). What about: CREATE COMPACT TABLE timeline ( userid_part1 text, userid_part2 int, posted_at date, posted_by uuid, body text header text GROUP (userid_part1, userid_part2) AS userid, PRIMARY KEY (userid, posted_at, posted_by) GROUP (header, body)) In C*:<userid_part1>:<userid_part2> : { <posted_at>:<posted_by> -> <header>:<body>} Query:SELECT posted_at, posted_by, body, header FROM timeline WHERE userid = ('john', 32) -- Sylvain On Mon, Jan 2, 2012 at 8:29 PM, Eric Evans <eev...@acunu.com> wrote: > On Mon, Jan 2, 2012 at 12:55 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> On Mon, Jan 2, 2012 at 10:53 AM, Eric Evans <eev...@acunu.com> wrote: >>> In SQL, PRIMARY KEY is a modifier to a column spec, and here PRIMARY >>> KEY(user_id, posted_at, posted_by) reads like a PRIMARY modifier >>> applied to a KEY() function. It's also a little strange the way it >>> appears in the grouping of column specs, when it's actually defining a >>> grouping or relationship of them (maybe this is what you meant about >>> using TRANSPOSED WITH <options> to emphasize the non-standard). >> >> Fear not, I can set your mind at ease. :) >> >> Personally I think the syntax works reasonably well in its own right, >> but my main reason for the proposed syntax is that it is actually >> standard SQL for composite primary keys at least as far back as SQL >> 92, as a subcategory of table constraints. The SQL standard is not >> freely linkable, but see >> http://www.postgresql.org/docs/9.1/static/sql-createtable.html for a >> real-world example. > > OK, I stand corrected (and my mind is at ease :) ). > > > -- > Eric Evans > Acunu | http://www.acunu.com | @acunu