[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-02-01 Thread Michael Mior (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028198#comment-17028198
 ] 

Michael Mior commented on CALCITE-1824:
---

Fixed in 
[1eb22d0|https://github.com/apache/calcite/commit/1eb22d03c11b9362b50208261b99eecc3de28fe8].

 

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-13 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014579#comment-17014579
 ] 

Julian Hyde commented on CALCITE-1824:
--

My first rewrite duplicated rows before the Aggregate. My new suggestion was an 
improvement to my first rewrite because it duplicates rows after the Aggregate. 
I had not noticed that Vladimir's rewrite duplicated rows after the Aggregate.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-12 Thread Feng Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014095#comment-17014095
 ] 

Feng Zhu commented on CALCITE-1824:
---

>From my understanding, you have the same meaning.

But current now, we do not have a RelNode to duplicate the rows.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-12 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014079#comment-17014079
 ] 

Vladimir Sitnikov commented on CALCITE-1824:


Can you please double-check?
I'm sure both my statements (in English and in SQL) describe that 
multiplication happens after aggregate:

{quote}Vladimir: Then we can rewrite query with just unique grouping sets, and 
use something like lateral unnest with a VALUES relation to multiply only those 
rows that need to be multiplied.{quote}

{quote}Vladimir: *{color:#DE350B}({color}* GROUPING_ID(all group columns).. 
*group by* non-duplicate-grouping-sets *{color:#DE350B}){color}*
lateral
*{color:#DE350B}({color}* select as many rows as required based on groupingid 
*{color:#DE350B}){color}*{quote}

I'm sure 

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-12 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013932#comment-17013932
 ] 

Julian Hyde commented on CALCITE-1824:
--

Yes, it's different because we would duplicate rows after the Aggregate, not 
before.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-12 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013885#comment-17013885
 ] 

Vladimir Sitnikov commented on CALCITE-1824:


{quote}for every row in the result for which GROUPING(deptno) is 0, emit two 
rows, one with group_id = 0 and one with group_id = 1.{quote}

Is it different from 
https://issues.apache.org/jira/browse/CALCITE-1824?focusedCommentId=17010020=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17010020
 ?

Just wondering what do you have in mind for selective duplicating of the rows.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-12 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013879#comment-17013879
 ] 

Julian Hyde commented on CALCITE-1824:
--

A more efficient rewrite just occurred to me. We could duplicate the rows AFTER 
they come out of the {{Aggregate}} operator. This is more efficient than 
doubling the number of rows that go into {{Aggregate}}.

For the above example
{code:java}
SELECT deptno, GROUP_ID() AS g
FROM Emp
GROUP BY GROUPING SETS (deptno, (), ()); {code}

you would compute

{code:java}
SELECT deptno, GROUPING(deptno) AS gr
FROM Emp
GROUP BY GROUPING SETS (deptno, ()); {code}

(note that there is only one {{()}}), and for every row in the result for which 
{{GROUPING(deptno)}} is 0, emit two rows, one with group_id = 0 and one with 
group_id = 1.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-09 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012148#comment-17012148
 ] 

Vladimir Sitnikov commented on CALCITE-1824:


{quote} you'll end up with something involving Join and Filter. So it's 
moot.{quote}
That is true.

So I would prefer UNION-ALL -- style rewrite.


> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-09 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012103#comment-17012103
 ] 

Julian Hyde commented on CALCITE-1824:
--

Yes, we support LATERAL. But LATERAL followed by a correlated query will be 
decorrelated during query preparation, and you'll end up with something 
involving Join and Filter. So it's moot.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Hyde
>Assignee: Feng Zhu
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Feng Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010425#comment-17010425
 ] 

Feng Zhu commented on CALCITE-1824:
---

Yes, your are right. In calcite, we can use lateral and unnest.

I mistakenly made something confused with Hive's dialect (i.e., _LATERAL VIEW 
EXPLODE(UDTF)/POSEXPLODE(UDTF)_).

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010398#comment-17010398
 ] 

Vladimir Sitnikov commented on CALCITE-1824:


What do you mean by "we do not support lateral view"?

Calcite supports quite involved LATERAL expressions: 
https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/piglet/src/test/java/org/apache/calcite/test/PigRelOpTest.java#L471-L478
 , 
https://github.com/apache/calcite/blob/7dcceeedbc9bb28167f3b7497a8eee1429bc51f6/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L960-L966

However, it would likely result in a correlate expression that might be tricky 
to implement for the underlying storage.

{quote}If you are not working on this issue, I'd like to take it.{quote}
Please do that. I'm not working on this.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Feng Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010324#comment-17010324
 ] 

Feng Zhu commented on CALCITE-1824:
---

Thanks [~vladimirsitnikov], rewriting to UNION-ALL is more intuitive to me.

If you are not working on this issue, I'd like to take it.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Feng Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010316#comment-17010316
 ] 

Feng Zhu commented on CALCITE-1824:
---

 
{quote}The problem with UNION ALL is that it duplicates the input relational 
expression. So we would tend to implement a plan that evaluates it twice.

Times like this, I wish we had a {{ProjectMany}} operator (compare LINQ's 
[SelectMany|https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany]).
{quote}
 

I think it is a good idea. We need a specific operator (e.g., _Expand_ in some 
products) which applies a number of projections to every input row, hence we 
can get multiple output rows for an input row to avoid duplicate data scan.

On the other hand, I also noticed that we do not support queries like (_lateral 
view ..._) now. The operator will play an important role if we support it in 
the future.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010014#comment-17010014
 ] 

Julian Hyde commented on CALCITE-1824:
--

I'm not arguing. Both our approaches work.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010007#comment-17010007
 ] 

Vladimir Sitnikov commented on CALCITE-1824:


{quote}The problem with UNION ALL is that it duplicates the input relational 
expression. So we would tend to implement a plan that evaluates it twice.{quote}
The problem with CROSS JOIN inside GROUP BY is it would likely defeat any 
optimizer that knows how to optimize regular GROUP BY.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010002#comment-17010002
 ] 

Julian Hyde commented on CALCITE-1824:
--

By the way, GROUP BY ... GROUPING SETS (i.e. Aggregate with more than one key) 
is a powerful operator that can avoid self-join and self-union in many cases. I 
am using it quite a lot recently. (Perhaps it can be made to work in this case 
too.)

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-07 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009995#comment-17009995
 ] 

Julian Hyde commented on CALCITE-1824:
--

The problem with UNION ALL is that it duplicates the input relational 
expression. So we would tend to implement a plan that evaluates it twice.

Times like this, I wish we had a {{ProjectMany}} operator (compare LINQ's 
[SelectMany|https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany]).

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-06 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009408#comment-17009408
 ] 

Vladimir Sitnikov commented on CALCITE-1824:


How about rewriting non-zero grouping cases with a help of UNION ALL?

For instance:

{code:sql}SELECT deptno, GROUP_ID() AS g
FROM Emp
GROUP BY GROUPING SETS (deptno, (), ());{code}
=>
{code:sql}SELECT deptno, 0 AS g
FROM Emp
GROUP BY GROUPING SETS (deptno, ())
union all
SELECT null, 1 AS g
FROM Emp
GROUP BY GROUPING SETS (())
;{code}

It looks to be simpler to reason about, and it would probably be easier for 
pattern-matching (e.g. materialization review)


> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-06 Thread Feng Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009377#comment-17009377
 ] 

Feng Zhu commented on CALCITE-1824:
---

Thanks [~julianhyde], it sounds like a feasible solution.

I'd like to make some research on it.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-1824) GROUP_ID returns wrong result

2020-01-06 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009285#comment-17009285
 ] 

Julian Hyde commented on CALCITE-1824:
--

Could this be done as a rewrite? The second query above could become
{code:java}
SELECT deptno, g
FROM (
  SELECT Emp.*, t.g
  FROM Emp CROSS JOIN (VALUES (0), (1)) as t(g))
GROUP BY GROUPING SETS ((deptno, g), g)
HAVING grouping(g) = 1 OR g = 0{code}

There are a couple of tricks:
* The CROSS JOIN creates two copies of each row, one with g = 0 and another 
with g = 1.
* The HAVING removes the g = 1 rows from the deptno sub-total.

But this avoids us having to modify {{Aggregate}}.

> GROUP_ID returns wrong result
> -
>
> Key: CALCITE-1824
> URL: https://issues.apache.org/jira/browse/CALCITE-1824
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
> specification wrong, and it returns the wrong result.
> {{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.
> I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
> ..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} 
> is useful only if you have duplicate grouping sets. If grouping sets are 
> distinct, {{GROUP_ID()}} will always return zero.
> Example 1
> {code}SELECT deptno, job, GROUP_ID() AS g
> FROM Emp
> GROUP BY ROLLUP(deptno, job)
> DEPTNO JOB  G
> -- - --
>   10 CLERK  0
>   10 MANAGER0
>   10 PRESIDENT  0
>   100
>   20 CLERK  0
>   20 ANALYST0
>   20 MANAGER0
>   200
>   30 CLERK  0
>   30 MANAGER0
>   30 SALESMAN   0
>   300
> 0
> {code} produces grouping sets (deptno, job), (deptno), (). These are 
> distinct, so GROUP_ID() is 0 for all rows.
> Example 2
> {code}SELECT deptno, GROUP_ID() AS g
> FROM Emp
> GROUP BY GROUPING SETS (deptno, (), ());
> DEPTNOG
> -- --
>   10  0
>   20  0
>   30  0
>   0
>   1
> {code}
> As you can see, the grouping set () occurs twice. So there is one row in the 
> result for each occurrence: the first occurrence has g = 0; the second has g 
> = 1.
> In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
> is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)