[ 
https://issues.apache.org/jira/browse/CASSANDRA-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051535#comment-14051535
 ] 

Samphel Norden commented on CASSANDRA-7494:
-------------------------------------------

Somehow not seeing how this addresses the question. 
Let me pose a use case

I am storing time series data in each row in a reverse chronological order 
which I can do by creating a cluster key on timestamp and storing with 
clustering (time desc) ..as a very simple example. 

I want to get the latest timestamp stored in each row...
select  first 1 time from the table is what I am looking for.

cql 0.8 even supported something like this...
http://stackoverflow.com/questions/8083102/select-first-n-from-cassandra-column-using-cql

 I am just wondering why this was taken out... granted the support below is not 
fully compliant in that it does require the user to specify the column 
name/range which is something that is usually hard to do when columns are 
dynamic. Of course a way around it would be to always store the latest 
timestamp in a special column say 999999999 and only select first 
'99999999'...'999999999' from table...

> CQL support to return first column of each row
> ----------------------------------------------
>
>                 Key: CASSANDRA-7494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: fedora 64bit 
>            Reporter: Samphel Norden
>
> This jira is a request to support a query like
> select first 5 columns of each row where <whereclause>
> Currently in CQL, if we put a limit clause it applies over all rows. Not a 
> per partition key limit. 
> More details below
> IF we create a table as follows
> CREATE TABLE xy (
> a int,
> b int,
> c int,
> d int,
> value int,
> PRIMARY KEY ((a, b), c, d)
> ) WITH CLUSTERING ORDER BY (c DESC, d ASC)
> with data = 
> a | b | c | d | value
> --------------
> 1 | 2 | 2007 | 307 | 950
> 1 | 2 | 2006 | 305 | 900
> 1 | 1 | 1006 | 205 | 800
> 1 | 1 | 1005 | 105 | 700
> The rows are sorted by c descending where assuming c is a timestamp, the idea 
> is to store the latest timestamp first. Hence if we pull a single column from 
> each row given a set of rows, we want that to be the latest 'c' for each row.
> In other words: 
> select first 1 value from xy where a=1 and b in (1,2)
> should return a single "value" for each rowkey
> a | b | c | d | value
> --------------
> 1 | 1 | 1006 | 205 | 800
> 1 | 2 | 2007 | 307 | 950
> I realize that if we do individual queries such as
> select a,b,c,value from xy where a=1 and b =1 limit 1;
> a | b | c | value
> -------+----
> 1 | 1 | 1006 | 800
> (1 rows)
> cqlsh:> select a,b,c,e from xy where a=1 and b =2 limit 1;
> a | b | c | value
> -------+----
> 1 | 2 | 2007 | 950
> We get the desired result.However this is highly inefficient since we would 
> need to fire a separate query per row. If we can have a construct change to 
> allow getting a single column for a given row that would be very helpful



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to