[ https://issues.apache.org/jira/browse/CASSANDRA-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901087#comment-13901087 ]
Aleksey Yeschenko commented on CASSANDRA-6561: ---------------------------------------------- All right. Issues: - ALTER TABLE RENAME allows renaming static columns (should not) - "DELETE static_col FROM table WHERE partition_key = X and clustering_col = Y" should? probably throw an error, and not delete the static_col? (not sure about this one) - continuing with the DELETEs, "DELETE static_col, regular_col FROM table WHERE partition_key = X" will throw "Missing mandatory PRIMARY KEY part ck since static_col specified", which is minor, of course, but should pick first non-static column instead (DeleteStatement) Nit: CFMetaData.SchemaColumnsCf still has 'is_static' in it Typo: 'buildt' in 'Note that we know that only the first group buildt can be static' in ColumnGroupMap Wishes: - I'd prefer BEGIN CAS BATCH, b/c currently you can use either BEGIN UNLOGGED BATCH or BEGIN [LOGGED] BATCH with conditions, and it confuses things. I prefer explicit here. - I'd prefer maybeUpdatePrefix() to updatePrefix() > Static columns in CQL3 > ---------------------- > > Key: CASSANDRA-6561 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6561 > Project: Cassandra > Issue Type: New Feature > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 2.0.6 > > > I'd like to suggest the following idea for adding "static" columns to CQL3. > I'll note that the basic idea has been suggested by jhalliday on irc but the > rest of the details are mine and I should be blamed for anything stupid in > what follows. > Let me start with a rational: there is 2 main family of CF that have been > historically used in Thrift: static ones and dynamic ones. CQL3 handles both > family through the presence or not of clustering columns. There is however > some cases where mixing both behavior has its use. I like to think of those > use cases as 3 broad category: > # to denormalize small amounts of not-entirely-static data in otherwise > static entities. It's say "tags" for a product or "custom properties" in a > user profile. This is why we've added CQL3 collections. Importantly, this is > the *only* use case for which collections are meant (which doesn't diminishes > their usefulness imo, and I wouldn't disagree that we've maybe not > communicated this too well). > # to optimize fetching both a static entity and related dynamic ones. Say you > have blog posts, and each post has associated comments (chronologically > ordered). *And* say that a very common query is "fetch a post and its 50 last > comments". In that case, it *might* be beneficial to store a blog post > (static entity) in the same underlying CF than it's comments for performance > reason. So that "fetch a post and it's 50 last comments" is just one slice > internally. > # you want to CAS rows of a dynamic partition based on some partition > condition. This is the same use case than why CASSANDRA-5633 exists for. > As said above, 1) is already covered by collections, but 2) and 3) are not > (and > I strongly believe collections are not the right fit, API wise, for those). > Also, note that I don't want to underestimate the usefulness of 2). In most > cases, using a separate table for the blog posts and the comments is The > Right Solution, and trying to do 2) is premature optimisation. Yet, when used > properly, that kind of optimisation can make a difference, so I think having > a relatively native solution for it in CQL3 could make sense. > Regarding 3), though CASSANDRA-5633 would provide one solution for it, I have > the feeling that static columns actually are a more natural approach (in term > of API). That's arguably more of a personal opinion/feeling though. > So long story short, CQL3 lacks a way to mix both some "static" and "dynamic" > rows in the same partition of the same CQL3 table, and I think such a tool > could have it's use. > The proposal is thus to allow "static" columns. Static columns would only > make sense in table with clustering columns (the "dynamic" ones). A static > column value would be static to the partition (all rows of the partition > would share the value for such column). The syntax would just be: > {noformat} > CREATE TABLE t ( > k text, > s text static, > i int, > v text, > PRIMARY KEY (k, i) > ) > {noformat} > then you'd get: > {noformat} > INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm shared", 0, "foo"); > INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm still shared", 1, "bar"); > SELECT * FROM t; > k | s | i | v > ------------------------------------ > k0 | "I'm still shared" | 0 | "bar" > k0 | "I'm still shared" | 1 | "foo" > {noformat} > There would be a few semantic details to decide on regarding deletions, ttl, > etc. but let's see if we agree it's a good idea first before ironing those > out. > One last point is the implementation. Though I do think this idea has merits, > it's definitively not useful enough to justify rewriting the storage engine > for it. But I think we can support this relatively easily (emphasis on > "relatively" :)), which is probably the main reason why I like the approach. > Namely, internally, we can store static columns as cells whose clustering > column values are empty. So in terms of cells, the partition of my example > would look like: > {noformat} > "k0" : [ > (:"s" -> "I'm still shared"), // the static column > (0:"" -> "") // row marker > (0:"v" -> "bar") > (1:"" -> "") // row marker > (1:"v" -> "foo") > ] > {noformat} > Of course, using empty values for the clustering columns doesn't quite work > because it could conflict with the user using empty clustering columns. But > in the CompositeType encoding we have the end-of-component byte that we could > reuse by using a specific value (say 0xFF, currently we never set that byte > to anything else than -1, 0 and 1) to indicate it's a static column. > With that, we'd need to update the CQL3 statements to support the new syntax > and rules, but that's probably not horribly hard. > So anyway, this may or may not be a good idea, but I think it has enough meat > to warrant some consideration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)