[jira] [Commented] (CASSANDRA-6561) Static columns in CQL3

DOAN DuyHai (JIRA) Fri, 21 Feb 2014 01:52:37 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908124#comment-13908124
 ]


DOAN DuyHai commented on CASSANDRA-6561:
----------------------------------------

It's me again.

 By talking with my collegues about the static columns this morning, one 
interesting question arises.

{code:sql}
CREATE TABLE blogpost (
   id bigint,
   content text static, 
   creation_date timestamp static,
   author_id text static,
   comment_date timeuuid,
   comment_author_id text,
   comment text,
   PRIMARY KEY (id,comment_date)) WITH CLUSTERING ORDER (comment_date DESC);
{code}

If we do something like : 
{code:sql}
 SELECT * FROM blogpost WHERE id=xxx LIMIT 10;
{code}

We'll have the blog post content plus the last 10 comments, pretty clear.

 The answer is something like


||id||content||creation_date||author_id||comment_date||comment_author_id||comment||
|10|big blog text|2014/02/20 10:00:00|author1|2014/02/20 22:00:00|user1|last 
comment|
|10|big blog text|2014/02/20 10:00:00|author1|2014/02/20 21:50:00|user2|before 
last comment|
|10|big blog text|2014/02/20 10:00:00|author1|2014/02/20 21:40:00|user3|another 
comment|
|10|big blog text|2014/02/20 10:00:00|author1|2014/02/20 21:30:00|user4|and 
again|
|10|big blog text|2014/02/20 10:00:00|author1|2014/02/20 21:00:00|user5|and 
again|

As you can see the first 4 columns are duplicated, only the last 3 columns 
vary, as per the CQL3 semantics. 

The question is: how are data sent over the wire ? *Are the first 4 columns 
duplication sent over the network or does the Java driver "reformat" the data 
to show them duplicated ?*





> Static columns in CQL3
> ----------------------
>
>                 Key: CASSANDRA-6561
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6561
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.6
>
>
> I'd like to suggest the following idea for adding "static" columns to CQL3.  
> I'll note that the basic idea has been suggested by jhalliday on irc but the 
> rest of the details are mine and I should be blamed for anything stupid in 
> what follows.
> Let me start with a rational: there is 2 main family of CF that have been 
> historically used in Thrift: static ones and dynamic ones. CQL3 handles both 
> family through the presence or not of clustering columns. There is however 
> some cases where mixing both behavior has its use. I like to think of those 
> use cases as 3 broad category:
> # to denormalize small amounts of not-entirely-static data in otherwise 
> static entities. It's say "tags" for a product or "custom properties" in a 
> user profile. This is why we've added CQL3 collections. Importantly, this is 
> the *only* use case for which collections are meant (which doesn't diminishes 
> their usefulness imo, and I wouldn't disagree that we've maybe not 
> communicated this too well).
> # to optimize fetching both a static entity and related dynamic ones. Say you 
> have blog posts, and each post has associated comments (chronologically 
> ordered). *And* say that a very common query is "fetch a post and its 50 last 
> comments". In that case, it *might* be beneficial to store a blog post 
> (static entity) in the same underlying CF than it's comments for performance 
> reason.  So that "fetch a post and it's 50 last comments" is just one slice 
> internally.
> # you want to CAS rows of a dynamic partition based on some partition 
> condition. This is the same use case than why CASSANDRA-5633 exists for.
> As said above, 1) is already covered by collections, but 2) and 3) are not 
> (and
> I strongly believe collections are not the right fit, API wise, for those).
> Also, note that I don't want to underestimate the usefulness of 2). In most 
> cases, using a separate table for the blog posts and the comments is The 
> Right Solution, and trying to do 2) is premature optimisation. Yet, when used 
> properly, that kind of optimisation can make a difference, so I think having 
> a relatively native solution for it in CQL3 could make sense.
> Regarding 3), though CASSANDRA-5633 would provide one solution for it, I have 
> the feeling that static columns actually are a more natural approach (in term 
> of API). That's arguably more of a personal opinion/feeling though.
> So long story short, CQL3 lacks a way to mix both some "static" and "dynamic" 
> rows in the same partition of the same CQL3 table, and I think such a tool 
> could have it's use.
> The proposal is thus to allow "static" columns. Static columns would only 
> make sense in table with clustering columns (the "dynamic" ones). A static 
> column value would be static to the partition (all rows of the partition 
> would share the value for such column). The syntax would just be:
> {noformat}
> CREATE TABLE t (
>   k text,
>   s text static,
>   i int,
>   v text,
>   PRIMARY KEY (k, i)
> )
> {noformat}
> then you'd get:
> {noformat}
> INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm shared",       0, "foo");
> INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm still shared", 1, "bar");
> SELECT * FROM t;
>  k |                  s | i |    v
> ------------------------------------
> k0 | "I'm still shared" | 0 | "bar"
> k0 | "I'm still shared" | 1 | "foo"
> {noformat}
> There would be a few semantic details to decide on regarding deletions, ttl, 
> etc. but let's see if we agree it's a good idea first before ironing those 
> out.
> One last point is the implementation. Though I do think this idea has merits, 
> it's definitively not useful enough to justify rewriting the storage engine 
> for it. But I think we can support this relatively easily (emphasis on 
> "relatively" :)), which is probably the main reason why I like the approach.
> Namely, internally, we can store static columns as cells whose clustering 
> column values are empty. So in terms of cells, the partition of my example 
> would look like:
> {noformat}
> "k0" : [
>   (:"s" -> "I'm still shared"), // the static column
>   (0:"" -> "")                  // row marker
>   (0:"v" -> "bar")
>   (1:"" -> "")                  // row marker
>   (1:"v" -> "foo")
> ]
> {noformat}
> Of course, using empty values for the clustering columns doesn't quite work 
> because it could conflict with the user using empty clustering columns. But 
> in the CompositeType encoding we have the end-of-component byte that we could 
> reuse by using a specific value (say 0xFF, currently we never set that byte 
> to anything else than -1, 0 and 1) to indicate it's a static column.
> With that, we'd need to update the CQL3 statements to support the new syntax 
> and rules, but that's probably not horribly hard.
> So anyway, this may or may not be a good idea, but I think it has enough meat 
> to warrant some consideration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6561) Static columns in CQL3

Reply via email to