[ 
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167513#comment-15167513
 ] 

Jon Haddad edited comment on CASSANDRA-10707 at 2/25/16 6:46 PM:
-----------------------------------------------------------------

I don't think changing the order of ORDER BY and GROUP BY is self explanatory, 
so it doesn't really offer any benefit, imo.  If I was trying out the feature 
I'd mostly be annoyed by it's difference from something I've got muscle memory 
for.  

If you wanted to be technically accurate about it, SQL is declarative.  The 
order in which you specify the predicates, for instance, doesn't matter, it 
just happens to line up with how we mentally process it.  If you chance the 
order of predicates in your WHERE clause it doesn't matter, you'll still end up 
with the same query result.

Assuming I'm understanding the implementation correctly, what you're saying is 
that the query behaves more the following:

{code}
select * from 
 ( select * from table order by some_field limit 100)
group by x,y,z
{code}

Is this correct, or am I missing something?  If it's the case, I hope this 
doesn't box us in later down the line if we want to add support for other 
operations (like sub queries).  If we're going to introduce more 
inconsistencies with SQL (which may be totally fair, I'm just thinking out loud 
here), we would want to put the GROUP BY after the LIMIT, since it's being 
applied then.  I'm not sure what this does to CQL in general, as now we've 
implicitly made the decision to introduce clauses in an imperative fashion.  
I'd rather not see new clauses added piece by piece with different rules 
depending on the context, that definitely won't make things any easier.

So my question is, is CQL a declarative language or not?  Will this ever be 
something we intend to allow:

{code}
select username, score, state count(state) as c from top_scores where game_id=5 
limit 1000 group by state order by c desc limit 5;
{code}

I don't think the above query works at all.  The aggregation is clearly a 
declarative clause.

Now, if the behavior of limit before aggregation is the right decision, that I 
might have to argue with.


was (Author: rustyrazorblade):
I don't think changing the order of ORDER BY and GROUP BY is self explanatory, 
so it doesn't really offer any benefit, imo.  If I was trying out the feature 
I'd mostly be annoyed by it's difference from something I've got muscle memory 
for.  

If you wanted to be technically accurate about it, SQL is declarative.  The 
order in which you specify the clauses doesn't matter, it just happens to line 
up with how we mentally process it.  If you chance the order of predicates in 
your WHERE clause it doesn't matter, you'll still end up with the same query 
result.

Assuming I'm understanding the implementation correctly, what you're saying is 
that the query behaves more the following:

{code}
select * from 
 ( select * from table order by some_field limit 100)
group by x,y,z
{code}

Is this correct, or am I missing something?  If it's the case, I hope this 
doesn't box us in later down the line if we want to add support for other 
operations (like sub queries).  If we're going to introduce more 
inconsistencies with SQL (which may be totally fair, I'm just thinking out loud 
here), we would want to put the GROUP BY after the LIMIT, since it's being 
applied then.  I'm not sure what this does to CQL in general, as now we've 
implicitly made the decision to introduce clauses in an imperative fashion.  
I'd rather not see new clauses added piece by piece with different rules 
depending on the context, that definitely won't make things any easier.

So my question is, is CQL a declarative language or not?  Will this ever be 
something we intend to allow:

{code}
select username, score, state count(state) as c from top_scores where game_id=5 
limit 1000 group by state order by c desc limit 5;
{code}

I don't think the above query works at all.  The aggregation is clearly a 
declarative clause.

Now, if the behavior of limit before aggregation is the right decision, that I 
might have to argue with.

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support 
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the 
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP 
> BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to