[ https://issues.apache.org/jira/browse/CASSANDRA-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051535#comment-14051535 ]
Samphel Norden commented on CASSANDRA-7494: ------------------------------------------- Somehow not seeing how this addresses the question. Let me pose a use case I am storing time series data in each row in a reverse chronological order which I can do by creating a cluster key on timestamp and storing with clustering (time desc) ..as a very simple example. I want to get the latest timestamp stored in each row... select first 1 time from the table is what I am looking for. cql 0.8 even supported something like this... http://stackoverflow.com/questions/8083102/select-first-n-from-cassandra-column-using-cql I am just wondering why this was taken out... granted the support below is not fully compliant in that it does require the user to specify the column name/range which is something that is usually hard to do when columns are dynamic. Of course a way around it would be to always store the latest timestamp in a special column say 999999999 and only select first '99999999'...'999999999' from table... > CQL support to return first column of each row > ---------------------------------------------- > > Key: CASSANDRA-7494 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7494 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: fedora 64bit > Reporter: Samphel Norden > > This jira is a request to support a query like > select first 5 columns of each row where <whereclause> > Currently in CQL, if we put a limit clause it applies over all rows. Not a > per partition key limit. > More details below > IF we create a table as follows > CREATE TABLE xy ( > a int, > b int, > c int, > d int, > value int, > PRIMARY KEY ((a, b), c, d) > ) WITH CLUSTERING ORDER BY (c DESC, d ASC) > with data = > a | b | c | d | value > -------------- > 1 | 2 | 2007 | 307 | 950 > 1 | 2 | 2006 | 305 | 900 > 1 | 1 | 1006 | 205 | 800 > 1 | 1 | 1005 | 105 | 700 > The rows are sorted by c descending where assuming c is a timestamp, the idea > is to store the latest timestamp first. Hence if we pull a single column from > each row given a set of rows, we want that to be the latest 'c' for each row. > In other words: > select first 1 value from xy where a=1 and b in (1,2) > should return a single "value" for each rowkey > a | b | c | d | value > -------------- > 1 | 1 | 1006 | 205 | 800 > 1 | 2 | 2007 | 307 | 950 > I realize that if we do individual queries such as > select a,b,c,value from xy where a=1 and b =1 limit 1; > a | b | c | value > -------+---- > 1 | 1 | 1006 | 800 > (1 rows) > cqlsh:> select a,b,c,e from xy where a=1 and b =2 limit 1; > a | b | c | value > -------+---- > 1 | 2 | 2007 | 950 > We get the desired result.However this is highly inefficient since we would > need to fire a separate query per row. If we can have a construct change to > allow getting a single column for a given row that would be very helpful -- This message was sent by Atlassian JIRA (v6.2#6252)