[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028198#comment-17028198 ] Michael Mior commented on CALCITE-1824: --- Fixed in [1eb22d0|https://github.com/apache/calcite/commit/1eb22d03c11b9362b50208261b99eecc3de28fe8]. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014579#comment-17014579 ] Julian Hyde commented on CALCITE-1824: -- My first rewrite duplicated rows before the Aggregate. My new suggestion was an improvement to my first rewrite because it duplicates rows after the Aggregate. I had not noticed that Vladimir's rewrite duplicated rows after the Aggregate. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014095#comment-17014095 ] Feng Zhu commented on CALCITE-1824: --- >From my understanding, you have the same meaning. But current now, we do not have a RelNode to duplicate the rows. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014079#comment-17014079 ] Vladimir Sitnikov commented on CALCITE-1824: Can you please double-check? I'm sure both my statements (in English and in SQL) describe that multiplication happens after aggregate: {quote}Vladimir: Then we can rewrite query with just unique grouping sets, and use something like lateral unnest with a VALUES relation to multiply only those rows that need to be multiplied.{quote} {quote}Vladimir: *{color:#DE350B}({color}* GROUPING_ID(all group columns).. *group by* non-duplicate-grouping-sets *{color:#DE350B}){color}* lateral *{color:#DE350B}({color}* select as many rows as required based on groupingid *{color:#DE350B}){color}*{quote} I'm sure > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013932#comment-17013932 ] Julian Hyde commented on CALCITE-1824: -- Yes, it's different because we would duplicate rows after the Aggregate, not before. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013885#comment-17013885 ] Vladimir Sitnikov commented on CALCITE-1824: {quote}for every row in the result for which GROUPING(deptno) is 0, emit two rows, one with group_id = 0 and one with group_id = 1.{quote} Is it different from https://issues.apache.org/jira/browse/CALCITE-1824?focusedCommentId=17010020=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17010020 ? Just wondering what do you have in mind for selective duplicating of the rows. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013879#comment-17013879 ] Julian Hyde commented on CALCITE-1824: -- A more efficient rewrite just occurred to me. We could duplicate the rows AFTER they come out of the {{Aggregate}} operator. This is more efficient than doubling the number of rows that go into {{Aggregate}}. For the above example {code:java} SELECT deptno, GROUP_ID() AS g FROM Emp GROUP BY GROUPING SETS (deptno, (), ()); {code} you would compute {code:java} SELECT deptno, GROUPING(deptno) AS gr FROM Emp GROUP BY GROUPING SETS (deptno, ()); {code} (note that there is only one {{()}}), and for every row in the result for which {{GROUPING(deptno)}} is 0, emit two rows, one with group_id = 0 and one with group_id = 1. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012148#comment-17012148 ] Vladimir Sitnikov commented on CALCITE-1824: {quote} you'll end up with something involving Join and Filter. So it's moot.{quote} That is true. So I would prefer UNION-ALL -- style rewrite. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012103#comment-17012103 ] Julian Hyde commented on CALCITE-1824: -- Yes, we support LATERAL. But LATERAL followed by a correlated query will be decorrelated during query preparation, and you'll end up with something involving Join and Filter. So it's moot. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Julian Hyde >Assignee: Feng Zhu >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010425#comment-17010425 ] Feng Zhu commented on CALCITE-1824: --- Yes, your are right. In calcite, we can use lateral and unnest. I mistakenly made something confused with Hive's dialect (i.e., _LATERAL VIEW EXPLODE(UDTF)/POSEXPLODE(UDTF)_). > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010398#comment-17010398 ] Vladimir Sitnikov commented on CALCITE-1824: What do you mean by "we do not support lateral view"? Calcite supports quite involved LATERAL expressions: https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/piglet/src/test/java/org/apache/calcite/test/PigRelOpTest.java#L471-L478 , https://github.com/apache/calcite/blob/7dcceeedbc9bb28167f3b7497a8eee1429bc51f6/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L960-L966 However, it would likely result in a correlate expression that might be tricky to implement for the underlying storage. {quote}If you are not working on this issue, I'd like to take it.{quote} Please do that. I'm not working on this. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010324#comment-17010324 ] Feng Zhu commented on CALCITE-1824: --- Thanks [~vladimirsitnikov], rewriting to UNION-ALL is more intuitive to me. If you are not working on this issue, I'd like to take it. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010316#comment-17010316 ] Feng Zhu commented on CALCITE-1824: --- {quote}The problem with UNION ALL is that it duplicates the input relational expression. So we would tend to implement a plan that evaluates it twice. Times like this, I wish we had a {{ProjectMany}} operator (compare LINQ's [SelectMany|https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany]). {quote} I think it is a good idea. We need a specific operator (e.g., _Expand_ in some products) which applies a number of projections to every input row, hence we can get multiple output rows for an input row to avoid duplicate data scan. On the other hand, I also noticed that we do not support queries like (_lateral view ..._) now. The operator will play an important role if we support it in the future. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010014#comment-17010014 ] Julian Hyde commented on CALCITE-1824: -- I'm not arguing. Both our approaches work. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010007#comment-17010007 ] Vladimir Sitnikov commented on CALCITE-1824: {quote}The problem with UNION ALL is that it duplicates the input relational expression. So we would tend to implement a plan that evaluates it twice.{quote} The problem with CROSS JOIN inside GROUP BY is it would likely defeat any optimizer that knows how to optimize regular GROUP BY. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010002#comment-17010002 ] Julian Hyde commented on CALCITE-1824: -- By the way, GROUP BY ... GROUPING SETS (i.e. Aggregate with more than one key) is a powerful operator that can avoid self-join and self-union in many cases. I am using it quite a lot recently. (Perhaps it can be made to work in this case too.) > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009995#comment-17009995 ] Julian Hyde commented on CALCITE-1824: -- The problem with UNION ALL is that it duplicates the input relational expression. So we would tend to implement a plan that evaluates it twice. Times like this, I wish we had a {{ProjectMany}} operator (compare LINQ's [SelectMany|https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany]). > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009408#comment-17009408 ] Vladimir Sitnikov commented on CALCITE-1824: How about rewriting non-zero grouping cases with a help of UNION ALL? For instance: {code:sql}SELECT deptno, GROUP_ID() AS g FROM Emp GROUP BY GROUPING SETS (deptno, (), ());{code} => {code:sql}SELECT deptno, 0 AS g FROM Emp GROUP BY GROUPING SETS (deptno, ()) union all SELECT null, 1 AS g FROM Emp GROUP BY GROUPING SETS (()) ;{code} It looks to be simpler to reason about, and it would probably be easier for pattern-matching (e.g. materialization review) > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009377#comment-17009377 ] Feng Zhu commented on CALCITE-1824: --- Thanks [~julianhyde], it sounds like a feasible solution. I'd like to make some research on it. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result
[ https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009285#comment-17009285 ] Julian Hyde commented on CALCITE-1824: -- Could this be done as a rewrite? The second query above could become {code:java} SELECT deptno, g FROM ( SELECT Emp.*, t.g FROM Emp CROSS JOIN (VALUES (0), (1)) as t(g)) GROUP BY GROUPING SETS ((deptno, g), g) HAVING grouping(g) = 1 OR g = 0{code} There are a couple of tricks: * The CROSS JOIN creates two copies of each row, one with g = 0 and another with g = 1. * The HAVING removes the g = 1 rows from the deptno sub-total. But this avoids us having to modify {{Aggregate}}. > GROUP_ID returns wrong result > - > > Key: CALCITE-1824 > URL: https://issues.apache.org/jira/browse/CALCITE-1824 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the > specification wrong, and it returns the wrong result. > {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle. > I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, > ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} > is useful only if you have duplicate grouping sets. If grouping sets are > distinct, {{GROUP_ID()}} will always return zero. > Example 1 > {code}SELECT deptno, job, GROUP_ID() AS g > FROM Emp > GROUP BY ROLLUP(deptno, job) > DEPTNO JOB G > -- - -- > 10 CLERK 0 > 10 MANAGER0 > 10 PRESIDENT 0 > 100 > 20 CLERK 0 > 20 ANALYST0 > 20 MANAGER0 > 200 > 30 CLERK 0 > 30 MANAGER0 > 30 SALESMAN 0 > 300 > 0 > {code} produces grouping sets (deptno, job), (deptno), (). These are > distinct, so GROUP_ID() is 0 for all rows. > Example 2 > {code}SELECT deptno, GROUP_ID() AS g > FROM Emp > GROUP BY GROUPING SETS (deptno, (), ()); > DEPTNOG > -- -- > 10 0 > 20 0 > 30 0 > 0 > 1 > {code} > As you can see, the grouping set () occurs twice. So there is one row in the > result for each occurrence: the first occurrence has g = 0; the second has g > = 1. > In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This > is wrong, but nevertheless closer to the required behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)