Re: Understanding annotations of SqlGroupingFunction

2020-04-23 Thread Julian Hyde
Doesn’t help much. It’s not obvious to me what “being grouped” means. 

Julian

> On Apr 21, 2020, at 23:52, XING JIN  wrote:
> 
> Hi Vineet ~
> +1 on your analysis.
> Checking below case in agg.iq. We can see that the behavior of GROUPING
> function in Calcite is the same as Hive.
> 
> # GROUPING in SELECT clause of CUBE query
> select deptno, job, count(*) as c, grouping(deptno) as d,
>  grouping(job) j, grouping(deptno, job) as x
> from "scott".emp
> group by cube(deptno, job);
> ++---++---+---+---+
> | DEPTNO | JOB   | C  | D | J | X |
> ++---++---+---+---+
> | 10 | CLERK |  1 | 0 | 0 | 0 |
> | 10 | MANAGER   |  1 | 0 | 0 | 0 |
> | 10 | PRESIDENT |  1 | 0 | 0 | 0 |
> | 10 |   |  3 | 0 | 1 | 1 |
> | 20 | ANALYST   |  2 | 0 | 0 | 0 |
> | 20 | CLERK |  2 | 0 | 0 | 0 |
> | 20 | MANAGER   |  1 | 0 | 0 | 0 |
> | 20 |   |  5 | 0 | 1 | 1 |
> | 30 | CLERK |  1 | 0 | 0 | 0 |
> | 30 | MANAGER   |  1 | 0 | 0 | 0 |
> | 30 | SALESMAN  |  4 | 0 | 0 | 0 |
> | 30 |   |  6 | 0 | 1 | 1 |
> || ANALYST   |  2 | 1 | 0 | 2 |
> || CLERK |  4 | 1 | 0 | 2 |
> || MANAGER   |  3 | 1 | 0 | 2 |
> || PRESIDENT |  1 | 1 | 0 | 2 |
> || SALESMAN  |  4 | 1 | 0 | 2 |
> ||   | 14 | 1 | 1 | 3 |
> ++---++---+---+---+
> 
> IMHO, we might rectify the doc of SqlGroupingFunction as below:
> 
> * 0 if both deptno and gender are being grouped,
> * 1 if only deptno is being grouped,
> * 2 if only gender is being groped,
> * 3 if neither deptno nor gender are being grouped.
> 
> 
> - Jin
> 
> 
> Vineet G  于2020年4月22日周三 上午5:18写道:
> 
>> I expect that the user behavior for the GROUPING in both hive and calcite
>> is same. It’s just the documentation which is a bit confusing.
>> e.g.  comment line on grouping : if both deptno and gender are being
>> grouped
>> 
>> should really mean that the row which represents the grand total i.e
>> without group by expression. will return 3.
>> 
>> FYI Hive had its behavior fixed with
>> https://issues.apache.org/jira/browse/HIVE-16102 <
>> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
>> GROUPING ID function.
>> 
>> Vineet Garg
>> 
>> 
>>> On Apr 21, 2020, at 10:42 AM, Julian Hyde 
>> wrote:
>>> 
>>> Suppose we have one row that represents the total for department 10, and
>> another that represents the grand total of all departments. Which row would
>> we say that department is “grouped” (in Calcite’s parlance) or “aggregated”
>> in (Hive’s parlance)?
>>> 
>>> I find the terms confusing. It’s possible that Calcite has them “wrong”.
>>> 
>>> I would simply run a query like
>>> 
>>> SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>>> FROM emp
>>> GROUP BY CUBE (deptno, job)
>>> 
>>> and see whether Hive and Calcite return the same result.
>>> 
 On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
 
 Hi, Hyde:
 It's confused me that some annotations in
 Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
 /**
 * The {@code GROUPING} function.
 *
 * Accepts 1 or more arguments.
 * Example: {@code GROUPING(deptno, gender)} returns
 * 3 if both deptno and gender are being grouped,
 * 2 if only deptno is being grouped,
 * 1 if only gender is being groped,
 * 0 if neither deptno nor gender are being grouped.
 *
 * This function is defined in the SQL standard.
 * {@code GROUPING_ID} is a non-standard synonym.
 *
 * Some examples are in {@code agg.iq}.
 */
 
 The annotations above seems conflicts with other implementations like
>> Hive(
 
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
 )
 
 Notice that: "The grouping function indicates whether an expression in a
 GROUP BY clause is aggregated or not for a given row. The value 0
 represents a column that is part of the grouping set, while the value 1
 represents a column that is not part of the grouping set. "
 
 
 It is clearly that 0 and 1 bit have different interpretation  between
 annotations in Calcite and in Hive. And I did not figure out why...
 
 Any feedback can give me on this would be highly appreciated.
 
 Best regards!
>> 
>> 


Re: Understanding annotations of SqlGroupingFunction

2020-04-22 Thread XING JIN
Filed a JIRA: https://issues.apache.org/jira/browse/CALCITE-3950

XING JIN  于2020年4月22日周三 下午2:51写道:

> Hi Vineet ~
> +1 on your analysis.
> Checking below case in agg.iq. We can see that the behavior of GROUPING
> function in Calcite is the same as Hive.
>
> # GROUPING in SELECT clause of CUBE query
> select deptno, job, count(*) as c, grouping(deptno) as d,
>   grouping(job) j, grouping(deptno, job) as x
> from "scott".emp
> group by cube(deptno, job);
> ++---++---+---+---+
> | DEPTNO | JOB   | C  | D | J | X |
> ++---++---+---+---+
> | 10 | CLERK |  1 | 0 | 0 | 0 |
> | 10 | MANAGER   |  1 | 0 | 0 | 0 |
> | 10 | PRESIDENT |  1 | 0 | 0 | 0 |
> | 10 |   |  3 | 0 | 1 | 1 |
> | 20 | ANALYST   |  2 | 0 | 0 | 0 |
> | 20 | CLERK |  2 | 0 | 0 | 0 |
> | 20 | MANAGER   |  1 | 0 | 0 | 0 |
> | 20 |   |  5 | 0 | 1 | 1 |
> | 30 | CLERK |  1 | 0 | 0 | 0 |
> | 30 | MANAGER   |  1 | 0 | 0 | 0 |
> | 30 | SALESMAN  |  4 | 0 | 0 | 0 |
> | 30 |   |  6 | 0 | 1 | 1 |
> || ANALYST   |  2 | 1 | 0 | 2 |
> || CLERK |  4 | 1 | 0 | 2 |
> || MANAGER   |  3 | 1 | 0 | 2 |
> || PRESIDENT |  1 | 1 | 0 | 2 |
> || SALESMAN  |  4 | 1 | 0 | 2 |
> ||   | 14 | 1 | 1 | 3 |
> ++---++---+---+---+
>
> IMHO, we might rectify the doc of SqlGroupingFunction as below:
>
> * 0 if both deptno and gender are being grouped,
> * 1 if only deptno is being grouped,
> * 2 if only gender is being groped,
> * 3 if neither deptno nor gender are being grouped.
>
>
> - Jin
>
>
> Vineet G  于2020年4月22日周三 上午5:18写道:
>
>> I expect that the user behavior for the GROUPING in both hive and calcite
>> is same. It’s just the documentation which is a bit confusing.
>> e.g.  comment line on grouping : if both deptno and gender are being
>> grouped
>>
>> should really mean that the row which represents the grand total i.e
>> without group by expression. will return 3.
>>
>> FYI Hive had its behavior fixed with
>> https://issues.apache.org/jira/browse/HIVE-16102 <
>> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
>> GROUPING ID function.
>>
>> Vineet Garg
>>
>>
>> > On Apr 21, 2020, at 10:42 AM, Julian Hyde 
>> wrote:
>> >
>> > Suppose we have one row that represents the total for department 10,
>> and another that represents the grand total of all departments. Which row
>> would we say that department is “grouped” (in Calcite’s parlance) or
>> “aggregated” in (Hive’s parlance)?
>> >
>> > I find the terms confusing. It’s possible that Calcite has them
>> “wrong”.
>> >
>> > I would simply run a query like
>> >
>> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>> >  FROM emp
>> >  GROUP BY CUBE (deptno, job)
>> >
>> > and see whether Hive and Calcite return the same result.
>> >
>> >> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
>> >>
>> >> Hi, Hyde:
>> >> It's confused me that some annotations in
>> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>> >> /**
>> >> * The {@code GROUPING} function.
>> >> *
>> >> * Accepts 1 or more arguments.
>> >> * Example: {@code GROUPING(deptno, gender)} returns
>> >> * 3 if both deptno and gender are being grouped,
>> >> * 2 if only deptno is being grouped,
>> >> * 1 if only gender is being groped,
>> >> * 0 if neither deptno nor gender are being grouped.
>> >> *
>> >> * This function is defined in the SQL standard.
>> >> * {@code GROUPING_ID} is a non-standard synonym.
>> >> *
>> >> * Some examples are in {@code agg.iq}.
>> >> */
>> >>
>> >> The annotations above seems conflicts with other implementations like
>> Hive(
>> >>
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>> >> )
>> >>
>> >> Notice that: "The grouping function indicates whether an expression in
>> a
>> >> GROUP BY clause is aggregated or not for a given row. The value 0
>> >> represents a column that is part of the grouping set, while the value 1
>> >> represents a column that is not part of the grouping set. "
>> >>
>> >>
>> >> It is clearly that 0 and 1 bit have different interpretation  between
>> >> annotations in Calcite and in Hive. And I did not figure out why...
>> >>
>> >> Any feedback can give me on this would be highly appreciated.
>> >>
>> >> Best regards!
>>
>>


Re: Understanding annotations of SqlGroupingFunction

2020-04-22 Thread XING JIN
Hi Vineet ~
+1 on your analysis.
Checking below case in agg.iq. We can see that the behavior of GROUPING
function in Calcite is the same as Hive.

# GROUPING in SELECT clause of CUBE query
select deptno, job, count(*) as c, grouping(deptno) as d,
  grouping(job) j, grouping(deptno, job) as x
from "scott".emp
group by cube(deptno, job);
++---++---+---+---+
| DEPTNO | JOB   | C  | D | J | X |
++---++---+---+---+
| 10 | CLERK |  1 | 0 | 0 | 0 |
| 10 | MANAGER   |  1 | 0 | 0 | 0 |
| 10 | PRESIDENT |  1 | 0 | 0 | 0 |
| 10 |   |  3 | 0 | 1 | 1 |
| 20 | ANALYST   |  2 | 0 | 0 | 0 |
| 20 | CLERK |  2 | 0 | 0 | 0 |
| 20 | MANAGER   |  1 | 0 | 0 | 0 |
| 20 |   |  5 | 0 | 1 | 1 |
| 30 | CLERK |  1 | 0 | 0 | 0 |
| 30 | MANAGER   |  1 | 0 | 0 | 0 |
| 30 | SALESMAN  |  4 | 0 | 0 | 0 |
| 30 |   |  6 | 0 | 1 | 1 |
|| ANALYST   |  2 | 1 | 0 | 2 |
|| CLERK |  4 | 1 | 0 | 2 |
|| MANAGER   |  3 | 1 | 0 | 2 |
|| PRESIDENT |  1 | 1 | 0 | 2 |
|| SALESMAN  |  4 | 1 | 0 | 2 |
||   | 14 | 1 | 1 | 3 |
++---++---+---+---+

IMHO, we might rectify the doc of SqlGroupingFunction as below:

* 0 if both deptno and gender are being grouped,
* 1 if only deptno is being grouped,
* 2 if only gender is being groped,
* 3 if neither deptno nor gender are being grouped.


- Jin


Vineet G  于2020年4月22日周三 上午5:18写道:

> I expect that the user behavior for the GROUPING in both hive and calcite
> is same. It’s just the documentation which is a bit confusing.
> e.g.  comment line on grouping : if both deptno and gender are being
> grouped
>
> should really mean that the row which represents the grand total i.e
> without group by expression. will return 3.
>
> FYI Hive had its behavior fixed with
> https://issues.apache.org/jira/browse/HIVE-16102 <
> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
> GROUPING ID function.
>
> Vineet Garg
>
>
> > On Apr 21, 2020, at 10:42 AM, Julian Hyde 
> wrote:
> >
> > Suppose we have one row that represents the total for department 10, and
> another that represents the grand total of all departments. Which row would
> we say that department is “grouped” (in Calcite’s parlance) or “aggregated”
> in (Hive’s parlance)?
> >
> > I find the terms confusing. It’s possible that Calcite has them “wrong”.
> >
> > I would simply run a query like
> >
> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
> >  FROM emp
> >  GROUP BY CUBE (deptno, job)
> >
> > and see whether Hive and Calcite return the same result.
> >
> >> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
> >>
> >> Hi, Hyde:
> >> It's confused me that some annotations in
> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
> >> /**
> >> * The {@code GROUPING} function.
> >> *
> >> * Accepts 1 or more arguments.
> >> * Example: {@code GROUPING(deptno, gender)} returns
> >> * 3 if both deptno and gender are being grouped,
> >> * 2 if only deptno is being grouped,
> >> * 1 if only gender is being groped,
> >> * 0 if neither deptno nor gender are being grouped.
> >> *
> >> * This function is defined in the SQL standard.
> >> * {@code GROUPING_ID} is a non-standard synonym.
> >> *
> >> * Some examples are in {@code agg.iq}.
> >> */
> >>
> >> The annotations above seems conflicts with other implementations like
> Hive(
> >>
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
> >> )
> >>
> >> Notice that: "The grouping function indicates whether an expression in a
> >> GROUP BY clause is aggregated or not for a given row. The value 0
> >> represents a column that is part of the grouping set, while the value 1
> >> represents a column that is not part of the grouping set. "
> >>
> >>
> >> It is clearly that 0 and 1 bit have different interpretation  between
> >> annotations in Calcite and in Hive. And I did not figure out why...
> >>
> >> Any feedback can give me on this would be highly appreciated.
> >>
> >> Best regards!
>
>


Re: Understanding annotations of SqlGroupingFunction

2020-04-21 Thread Vineet G
I expect that the user behavior for the GROUPING in both hive and calcite is 
same. It’s just the documentation which is a bit confusing.
e.g.  comment line on grouping : if both deptno and gender are being grouped

should really mean that the row which represents the grand total i.e  without 
group by expression. will return 3.

FYI Hive had its behavior fixed with 
https://issues.apache.org/jira/browse/HIVE-16102 
 and currently uses GROUPING 
ID function.

Vineet Garg


> On Apr 21, 2020, at 10:42 AM, Julian Hyde  wrote:
> 
> Suppose we have one row that represents the total for department 10, and 
> another that represents the grand total of all departments. Which row would 
> we say that department is “grouped” (in Calcite’s parlance) or “aggregated” 
> in (Hive’s parlance)?
> 
> I find the terms confusing. It’s possible that Calcite has them “wrong”. 
> 
> I would simply run a query like
> 
>  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>  FROM emp
>  GROUP BY CUBE (deptno, job)
> 
> and see whether Hive and Calcite return the same result. 
> 
>> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
>> 
>> Hi, Hyde:
>> It's confused me that some annotations in
>> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>> /**
>> * The {@code GROUPING} function.
>> *
>> * Accepts 1 or more arguments.
>> * Example: {@code GROUPING(deptno, gender)} returns
>> * 3 if both deptno and gender are being grouped,
>> * 2 if only deptno is being grouped,
>> * 1 if only gender is being groped,
>> * 0 if neither deptno nor gender are being grouped.
>> *
>> * This function is defined in the SQL standard.
>> * {@code GROUPING_ID} is a non-standard synonym.
>> *
>> * Some examples are in {@code agg.iq}.
>> */
>> 
>> The annotations above seems conflicts with other implementations like Hive(
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>> )
>> 
>> Notice that: "The grouping function indicates whether an expression in a
>> GROUP BY clause is aggregated or not for a given row. The value 0
>> represents a column that is part of the grouping set, while the value 1
>> represents a column that is not part of the grouping set. "
>> 
>> 
>> It is clearly that 0 and 1 bit have different interpretation  between
>> annotations in Calcite and in Hive. And I did not figure out why...
>> 
>> Any feedback can give me on this would be highly appreciated.
>> 
>> Best regards!



Re: Understanding annotations of SqlGroupingFunction

2020-04-21 Thread Julian Hyde
Suppose we have one row that represents the total for department 10, and 
another that represents the grand total of all departments. Which row would we 
say that department is “grouped” (in Calcite’s parlance) or “aggregated” in 
(Hive’s parlance)?

I find the terms confusing. It’s possible that Calcite has them “wrong”. 

I would simply run a query like

  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
  FROM emp
  GROUP BY CUBE (deptno, job)

and see whether Hive and Calcite return the same result. 

> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
> 
> Hi, Hyde:
> It's confused me that some annotations in
> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
> /**
> * The {@code GROUPING} function.
> *
> * Accepts 1 or more arguments.
> * Example: {@code GROUPING(deptno, gender)} returns
> * 3 if both deptno and gender are being grouped,
> * 2 if only deptno is being grouped,
> * 1 if only gender is being groped,
> * 0 if neither deptno nor gender are being grouped.
> *
> * This function is defined in the SQL standard.
> * {@code GROUPING_ID} is a non-standard synonym.
> *
> * Some examples are in {@code agg.iq}.
> */
> 
> The annotations above seems conflicts with other implementations like Hive(
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
> )
> 
> Notice that: "The grouping function indicates whether an expression in a
> GROUP BY clause is aggregated or not for a given row. The value 0
> represents a column that is part of the grouping set, while the value 1
> represents a column that is not part of the grouping set. "
> 
> 
> It is clearly that 0 and 1 bit have different interpretation  between
> annotations in Calcite and in Hive. And I did not figure out why...
> 
> Any feedback can give me on this would be highly appreciated.
> 
> Best regards!