[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010316#comment-17010316 ]
Feng Zhu commented on CALCITE-1824: ----------------------------------- {quote}The problem with UNION ALL is that it duplicates the input relational expression. So we would tend to implement a plan that evaluates it twice. Times like this, I wish we had a {{ProjectMany}} operator (compare LINQ's [SelectMany|https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany]). {quote} I think it is a good idea. We need a specific operator (e.g., _Expand_ in some products) which applies a number of projections to every input row, hence we can get multiple output rows for an input row to avoid duplicate data scan. On the other hand, I also noticed that we do not support queries like (_lateral view ..._) now. The operator will play an important role if we support it in the future. > GROUP_ID returns wrong result > ----------------------------- > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > ---------- --------- ---------- > 10 CLERK 0 > 10 MANAGER 0 > 10 PRESIDENT 0 > 10 0 > 20 CLERK 0 > 20 ANALYST 0 > 20 MANAGER 0 > 20 0 > 30 CLERK 0 > 30 MANAGER 0 > 30 SALESMAN 0 > 30 0 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNO G > ---------- ---------- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)