[jira] [Comment Edited] (CALCITE-1824) GROUP_ID returns wrong result

Vladimir Sitnikov (Jira) Tue, 07 Jan 2020 11:15:18 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010007#comment-17010007
 ]


Vladimir Sitnikov edited comment on CALCITE-1824 at 1/7/20 7:14 PM:
--------------------------------------------------------------------

{quote}The problem with UNION ALL is that it duplicates the input relational 
expression. So we would tend to implement a plan that evaluates it twice.{quote}
The problem with CROSS JOIN inside GROUP BY is it would likely defeat any 
optimizer that knows how to optimize regular GROUP BY.

For instance if (...cross join...) group by will be sent via JDBC, the 
underlying DB won't be able to optimize it properly.


was (Author: vladimirsitnikov):
{quote}The problem with UNION ALL is that it duplicates the input relational 
expression. So we would tend to implement a plan that evaluates it twice.{quote}
The problem with CROSS JOIN inside GROUP BY is it would likely defeat any 
optimizer that knows how to optimize regular GROUP BY.

> GROUP_ID returns wrong result
> -----------------------------
>
>                 Key: CALCITE-1824
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1824
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
>     DEPTNO JOB                      G
> ---------- --------- ----------
>       10 CLERK              0
>       10 MANAGER            0
>       10 PRESIDENT          0
>       10                    0
>       20 CLERK              0
>       20 ANALYST            0
>       20 MANAGER            0
>       20                    0
>       30 CLERK              0
>       30 MANAGER            0
>       30 SALESMAN           0
>       30                    0
>                             0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
>     DEPTNO        G
> ---------- ----------
>       10          0
>       20          0
>       30          0
>                   0
>                   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (CALCITE-1824) GROUP_ID returns wrong result

Reply via email to