[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

Jonathan Ellis (JIRA) Fri, 19 Aug 2011 08:36:53 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087765#comment-13087765
 ]


Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 3:36 PM:
--------------------------------------------------------------------

I need to back up.  Neither transposed() nor transposed(column) gives us the 
power we need, unless we layer subqueries on top.  Let me use a more complete 
example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done 
"properly" in supercolumns.  Then we might want to say, "give me the most 
recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, 
cumbersome "column limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's 
ok.  Or even, as you originally suggested, just leave it out and say "you can 
only do full-CF scans without a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row 
full of supercolumns representing tweets could be "tweet_id" instead of 
"supercolumn."  So it would be nice to support this somehow in "full metastore 
Hive."

This is important because we want to generalize beyond the current "one level 
of nesting" you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
                  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite 
columns of (tweet_id, retweet_id).  (You can think of this as simply 
supercolumn/subcolumn, the difference being that composites won't deserialize 
the entire parent container to fetch a single subcolumn or subcolumn slice.)  
And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM timeline 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name 
/ supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}


      was (Author: jbellis):
    I need to back up.  Neither transposed() nor transposed(column) gives us 
the power we need, unless we layer subqueries on top.  Let me use a more 
complete example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done 
"properly" in supercolumns.  Then we might want to say, "give me the most 
recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM foo WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, 
cumbersome "column limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM foo WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's 
ok.  Or even, as you originally suggested, just leave it out and say "you can 
only do full-CF scans without a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row 
full of supercolumns representing tweets could be "tweet_id" instead of 
"supercolumn."  So it would be nice to support this somehow in "full metastore 
Hive."

This is important because we want to generalize beyond the current "one level 
of nesting" you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
                  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite 
columns of (tweet_id, retweet_id).  (You can think of this as simply 
supercolumn/subcolumn, the difference being that composites won't deserialize 
the entire parent container to fetch a single subcolumn or subcolumn slice.)  
And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM timeline 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name 
/ supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}

  
> CQL support for compound columns
> --------------------------------
>
>                 Key: CASSANDRA-2474
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API, Core
>            Reporter: Eric Evans
>              Labels: cql
>             Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

Reply via email to