Maybe the ship on this has sailed, but I am a bit miffed on "create table". CQL is going out of its way to make things so easy for people. But if someone does not understand the concept of a column family making it easy for them to design something that is an anti-pattern is odd to me.
As an admin I have been called many times to troubleshoot database performance issues databases. It sometimes boils down to a bad schema choice. At later/production stages these become hard to dig out of. It usually takes more hardware, converting GB or TB of data, application cut overs. I do not call "column families" "tables". If someone newer to cassandra did I would correct them. Why not call Java references pointers? I hate being ambiguous on key terminology. On Mon, Jan 2, 2012 at 11:53 AM, Eric Evans <eev...@acunu.com> wrote: > On Sat, Dec 31, 2011 at 1:12 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > > On Fri, Dec 30, 2011 at 12:30 PM, Eric Evans <eev...@acunu.com> wrote: > >>> CREATE TABLE timeline ( > >>> user_id int, > >>> posted_at uuid, > >>> body string, > >>> posted_by string, > >>> PRIMARY KEY(user_id, posted_at, posted_by), > >>> VALUE(body) > >>> ); > >> > >> I think the value declaration also helps in that it's one more thing > >> that provides cues as to the data model it creates (more expressive). > >> But this got me thinking, why not introduce something special for the > >> composite name as well? That way the PRIMARY KEY syntax (which comes > >> preloaded with meaning and expectations) could be kept more SQLish, > >> and the whole thing looks more like an extension to the language as > >> opposed to a modification. > >> > >> Say: > >> > >> CREATE TABLE timeline ( > >> user_id int PRIMARY KEY, > >> posted_at uuid, > >> body text, > >> posted_by text, > >> COMPOSITE_NAME(posted_at, posted_by), > >> COMPOSITE_VALUE(body) > >> ) > > > > I went back and forth on this mentally, but I come down as -0 on CN > > instead of PK. For two reasons: > > > > First, the composite PRIMARY KEY is a better description of what you > > can actually do with the data. In a relational model, a PK of user_id > > means there is only one (user_id, posted_at, body, posted_by) row with > > a given user_id. Which is not the case here. PK = (row key + > > composite components) captures exactly what is "immutable and unique" > > in a given object, so it's actually exactly what it's meant for and > > not an abuse at all. (It even fits nicely with the "queries involving > > the PK are always indexed" assumption that isn't required by the SQL > > standard but every other database does anyway because it makes the > > most sense.) > > Yeah, you're right, PK is a better fit for this. > > Now that I'm forced to think about it a bit more, I think my un-SQL > reaction is probably rooted more in the abuse of the PRIMARY KEY > syntax, than the meaning it conveys. > > In SQL, PRIMARY KEY is a modifier to a column spec, and here PRIMARY > KEY(user_id, posted_at, posted_by) reads like a PRIMARY modifier > applied to a KEY() function. It's also a little strange the way it > appears in the grouping of column specs, when it's actually defining a > grouping or relationship of them (maybe this is what you meant about > using TRANSPOSED WITH <options> to emphasize the non-standard). > > I wonder if there isn't a way to keep the PRIMARY KEY connection while > making it a little more SQL (and hence more intuitive). Maybe > something like: > > > CREATE TABLE timeline ( > (user_id int, posted_at uuid, posted_by) PRIMARY KEY, > body text > ) > > > -- > Eric Evans > Acunu | http://www.acunu.com | @acunu >