[jira] [Commented] (CASSANDRA-6561) Static columns in CQL3

Nicolas Favre-Felix (JIRA) Fri, 14 Feb 2014 05:13:00 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901408#comment-13901408
 ]


Nicolas Favre-Felix commented on CASSANDRA-6561:
------------------------------------------------

Hi,

I'd like to add that it is important to be able to fetch both a static column 
and clustered columns in a single select; this does not seem possible at the 
moment:

{code}
cqlsh:ks1> CREATE TABLE foo (
  x text,
  y bigint,
  t bigint static,
  z bigint,
  PRIMARY KEY (x, y)
);

cqlsh:ks1> insert into foo (x,y,z) values ('a', 1, 10);
cqlsh:ks1> insert into foo (x,y,z) values ('a', 2, 20);
cqlsh:ks1> update foo set t = 2 where x='a';
cqlsh:ks1> select * from foo;

 x | y | t | z
---+---+---+----
 a | 1 | 2 | 10
 a | 2 | 2 | 20

(2 rows)
{code}

Here we have a select over a whole partition and it pulls the static column 
just fine. Selecting a CQL row works, of course, and selecting a static column 
does too:

{code}
cqlsh:ks1> select x,y,z from foo where x='a' and y=1;

 x | y | z
---+---+----
 a | 1 | 10

(1 rows)

cqlsh:ks1> select t from foo where x='a';

 t
---
 2

(1 rows)
{code}

But selecting them together fails to return anything:
{code}
cqlsh:ks1> select x,y,z,t from foo where x='a' and y=1;

(0 rows)
{code}

Now this does partly make sense because there isn't really a value for "t" 
where y=1 since "t" isn't clustered. But it is important to be consistent with 
the output for the full table.

Note that querying the full partition returns only the static column now:

{code}
cqlsh:ks1> select x,y,z,t from foo where x='a';

 x | y    | z    | t
---+------+------+---
 a | null | null | 4

(1 rows)
{code}

Currently, the patches add support for:
* Selecting a CQL row by primary key (that's a standard feature).
* Selecting a static column by partition key (added by Sylvain).

So I'd say it's important to be able to support:
* Selecting clustered as well as static columns for a given CQL row.
* Selecting clustered as well as static columns for a given partition.

Not being able to fetch both the static column and a CQL row or set of CQL rows 
in a single read makes it impossible to rely on partition-level isolation for 
consistent reads.

> Static columns in CQL3
> ----------------------
>
>                 Key: CASSANDRA-6561
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6561
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.6
>
>
> I'd like to suggest the following idea for adding "static" columns to CQL3.  
> I'll note that the basic idea has been suggested by jhalliday on irc but the 
> rest of the details are mine and I should be blamed for anything stupid in 
> what follows.
> Let me start with a rational: there is 2 main family of CF that have been 
> historically used in Thrift: static ones and dynamic ones. CQL3 handles both 
> family through the presence or not of clustering columns. There is however 
> some cases where mixing both behavior has its use. I like to think of those 
> use cases as 3 broad category:
> # to denormalize small amounts of not-entirely-static data in otherwise 
> static entities. It's say "tags" for a product or "custom properties" in a 
> user profile. This is why we've added CQL3 collections. Importantly, this is 
> the *only* use case for which collections are meant (which doesn't diminishes 
> their usefulness imo, and I wouldn't disagree that we've maybe not 
> communicated this too well).
> # to optimize fetching both a static entity and related dynamic ones. Say you 
> have blog posts, and each post has associated comments (chronologically 
> ordered). *And* say that a very common query is "fetch a post and its 50 last 
> comments". In that case, it *might* be beneficial to store a blog post 
> (static entity) in the same underlying CF than it's comments for performance 
> reason.  So that "fetch a post and it's 50 last comments" is just one slice 
> internally.
> # you want to CAS rows of a dynamic partition based on some partition 
> condition. This is the same use case than why CASSANDRA-5633 exists for.
> As said above, 1) is already covered by collections, but 2) and 3) are not 
> (and
> I strongly believe collections are not the right fit, API wise, for those).
> Also, note that I don't want to underestimate the usefulness of 2). In most 
> cases, using a separate table for the blog posts and the comments is The 
> Right Solution, and trying to do 2) is premature optimisation. Yet, when used 
> properly, that kind of optimisation can make a difference, so I think having 
> a relatively native solution for it in CQL3 could make sense.
> Regarding 3), though CASSANDRA-5633 would provide one solution for it, I have 
> the feeling that static columns actually are a more natural approach (in term 
> of API). That's arguably more of a personal opinion/feeling though.
> So long story short, CQL3 lacks a way to mix both some "static" and "dynamic" 
> rows in the same partition of the same CQL3 table, and I think such a tool 
> could have it's use.
> The proposal is thus to allow "static" columns. Static columns would only 
> make sense in table with clustering columns (the "dynamic" ones). A static 
> column value would be static to the partition (all rows of the partition 
> would share the value for such column). The syntax would just be:
> {noformat}
> CREATE TABLE t (
>   k text,
>   s text static,
>   i int,
>   v text,
>   PRIMARY KEY (k, i)
> )
> {noformat}
> then you'd get:
> {noformat}
> INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm shared",       0, "foo");
> INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm still shared", 1, "bar");
> SELECT * FROM t;
>  k |                  s | i |    v
> ------------------------------------
> k0 | "I'm still shared" | 0 | "bar"
> k0 | "I'm still shared" | 1 | "foo"
> {noformat}
> There would be a few semantic details to decide on regarding deletions, ttl, 
> etc. but let's see if we agree it's a good idea first before ironing those 
> out.
> One last point is the implementation. Though I do think this idea has merits, 
> it's definitively not useful enough to justify rewriting the storage engine 
> for it. But I think we can support this relatively easily (emphasis on 
> "relatively" :)), which is probably the main reason why I like the approach.
> Namely, internally, we can store static columns as cells whose clustering 
> column values are empty. So in terms of cells, the partition of my example 
> would look like:
> {noformat}
> "k0" : [
>   (:"s" -> "I'm still shared"), // the static column
>   (0:"" -> "")                  // row marker
>   (0:"v" -> "bar")
>   (1:"" -> "")                  // row marker
>   (1:"v" -> "foo")
> ]
> {noformat}
> Of course, using empty values for the clustering columns doesn't quite work 
> because it could conflict with the user using empty clustering columns. But 
> in the CompositeType encoding we have the end-of-component byte that we could 
> reuse by using a specific value (say 0xFF, currently we never set that byte 
> to anything else than -1, 0 and 1) to indicate it's a static column.
> With that, we'd need to update the CQL3 statements to support the new syntax 
> and rules, but that's probably not horribly hard.
> So anyway, this may or may not be a good idea, but I think it has enough meat 
> to warrant some consideration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6561) Static columns in CQL3

Reply via email to