[ 
https://issues.apache.org/jira/browse/CASSANDRA-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519847#comment-16519847
 ] 

Alexander Ivakov edited comment on CASSANDRA-14376 at 6/21/18 11:41 PM:
------------------------------------------------------------------------

Cassandra currently doesn't implement group by on arbitrary columns. When 
grouping by multiple columns, you can only group by primary key columns in the 
order they are declared in the primary key, starting from the first.

The only exception allowed is if you have restricted the first n columns with 
an equality restriction. Note, restricting a column with an "=" is selecting 
one group, so there is nothing to group in that column, that's why Cassandra 
allows this. You can then group by the remaining columns, in order and starting 
from the next column (you can’t skip columns in between).

So to group by a column, all preceding primary key columns have to be either 
restricted by "=" or be in the group by clause.

The range query like the one above fails because the sample column is being 
restricted by a range, thus still having multiple groups, but is not in the 
group by clause. LIKE and IN restrictions will also not work in this case.

This query will work:

select city, state, sum(count) from samples where name='bob' and partition=1 
and sample>=1 and sample<=3 group by sample, city, state; 

The partition key columns are restricted by "=" and all the clustering key 
columns are in the group by clause in the same order as the schema. “Group by 
sample” and “group by sample, city” will also work.


was (Author: alex.ivakov):
Cassandra currently doesn't implement group by on arbitrary columns. When 
grouping by multiple columns, you can only group by primary key columns in the 
order they are declared in the primary key, starting from the first. 

 

The only exception allowed is if you have restricted the first n columns with 
an equality restriction. Note, restricting a column with an "=" is selecting 
one group, so there is nothing to group in that column, that's why Cassandra 
allows this. You can then group by the remaining columns, in order and starting 
from the next column (you can’t skip columns in between).

 

So to group by a column, all preceding primary key columns have to be either 
restricted by "=" or be in the group by clause. 

 

The range query like the one above fails because the sample column is being 
restricted by a range, thus still having multiple groups, but is not in the 
group by clause. LIKE and IN restrictions will also not work in this case. 

 

This query will work:

 

select city, state, sum(count) from samples where name='bob' and partition=1 
and sample>=1 and sample<=3 group by sample, city, state; 

 

The partition key columns are restricted by "=" and all the clustering key 
columns are in the group by clause in the same order as the schema. “Group by 
sample” and “group by sample, city” will also work.

> Limiting a clustering column with a range not allowed when using "group by"
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14376
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14376
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: Cassandra 3.11.1
>            Reporter: Chris mildebrandt
>            Priority: Major
>
> I’m trying to use a range to limit a clustering column while at the same time 
> using `group by` and running into issues. Here’s a sample table:
> {{create table if not exists samples (name text, partition int, sample int, 
> city text, state text, count counter, primary key ((name, partition), sample, 
> city, state)) with clustering order by (sample desc);}}
> When I filter `sample` by a range, I get an error:
> {{select city, state, sum(count) from samples where name='bob' and 
> partition=1 and sample>=1 and sample<=3 group by city, state;}}
>  {{{color:#ff0000}InvalidRequest: Error from server: code=2200 [Invalid 
> query] message="Group by currently only support groups of columns following 
> their declared order in the PRIMARY KEY"{color}}}
> However, it allows the query when I change from a range to an equals:
> {{select city, state, sum(count) from samples where name='bob' and 
> partition=1 and sample=1 group by city, state;}}
> {{city | state | system.sum(count)}}
> {{--------++--------------------------}}
> {{ Austin | TX | 2}}
> {{ Denver | CO | 1}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to