Re: Understanding annotations of SqlGroupingFunction

Julian Hyde Thu, 23 Apr 2020 09:27:20 -0700

Doesn’t help much. It’s not obvious to me what “being grouped” means.


Julian

> On Apr 21, 2020, at 23:52, XING JIN <jinxing.co...@gmail.com> wrote:
> 
> Hi Vineet ~
> +1 on your analysis.
> Checking below case in agg.iq. We can see that the behavior of GROUPING
> function in Calcite is the same as Hive.
> 
> # GROUPING in SELECT clause of CUBE query
> select deptno, job, count(*) as c, grouping(deptno) as d,
>  grouping(job) j, grouping(deptno, job) as x
> from "scott".emp
> group by cube(deptno, job);
> +--------+-----------+----+---+---+---+
> | DEPTNO | JOB       | C  | D | J | X |
> +--------+-----------+----+---+---+---+
> |     10 | CLERK     |  1 | 0 | 0 | 0 |
> |     10 | MANAGER   |  1 | 0 | 0 | 0 |
> |     10 | PRESIDENT |  1 | 0 | 0 | 0 |
> |     10 |           |  3 | 0 | 1 | 1 |
> |     20 | ANALYST   |  2 | 0 | 0 | 0 |
> |     20 | CLERK     |  2 | 0 | 0 | 0 |
> |     20 | MANAGER   |  1 | 0 | 0 | 0 |
> |     20 |           |  5 | 0 | 1 | 1 |
> |     30 | CLERK     |  1 | 0 | 0 | 0 |
> |     30 | MANAGER   |  1 | 0 | 0 | 0 |
> |     30 | SALESMAN  |  4 | 0 | 0 | 0 |
> |     30 |           |  6 | 0 | 1 | 1 |
> |        | ANALYST   |  2 | 1 | 0 | 2 |
> |        | CLERK     |  4 | 1 | 0 | 2 |
> |        | MANAGER   |  3 | 1 | 0 | 2 |
> |        | PRESIDENT |  1 | 1 | 0 | 2 |
> |        | SALESMAN  |  4 | 1 | 0 | 2 |
> |        |           | 14 | 1 | 1 | 3 |
> +--------+-----------+----+---+---+---+
> 
> IMHO, we might rectify the doc of SqlGroupingFunction as below:
> 
> * 0 if both deptno and gender are being grouped,
> * 1 if only deptno is being grouped,
> * 2 if only gender is being groped,
> * 3 if neither deptno nor gender are being grouped.
> 
> 
> - Jin
> 
> 
> Vineet G <vgargapa...@gmail.com> 于2020年4月22日周三 上午5:18写道：
> 
>> I expect that the user behavior for the GROUPING in both hive and calcite
>> is same. It’s just the documentation which is a bit confusing.
>> e.g.  comment line on grouping : if both deptno and gender are being
>> grouped
>> 
>> should really mean that the row which represents the grand total i.e
>> without group by expression. will return 3.
>> 
>> FYI Hive had its behavior fixed with
>> https://issues.apache.org/jira/browse/HIVE-16102 <
>> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
>> GROUPING ID function.
>> 
>> Vineet Garg
>> 
>> 
>>> On Apr 21, 2020, at 10:42 AM, Julian Hyde <jhyde.apa...@gmail.com>
>> wrote:
>>> 
>>> Suppose we have one row that represents the total for department 10, and
>> another that represents the grand total of all departments. Which row would
>> we say that department is “grouped” (in Calcite’s parlance) or “aggregated”
>> in (Hive’s parlance)?
>>> 
>>> I find the terms confusing. It’s possible that Calcite has them “wrong”.
>>> 
>>> I would simply run a query like
>>> 
>>> SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>>> FROM emp
>>> GROUP BY CUBE (deptno, job)
>>> 
>>> and see whether Hive and Calcite return the same result.
>>> 
>>>> On Apr 20, 2020, at 6:58 PM, ZZY <joa...@gmail.com> wrote:
>>>> 
>>>> Hi, Hyde:
>>>> It's confused me that some annotations in
>>>> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>>>> /**
>>>> * The {@code GROUPING} function.
>>>> *
>>>> * <p>Accepts 1 or more arguments.
>>>> * Example: {@code GROUPING(deptno, gender)} returns
>>>> * 3 if both deptno and gender are being grouped,
>>>> * 2 if only deptno is being grouped,
>>>> * 1 if only gender is being groped,
>>>> * 0 if neither deptno nor gender are being grouped.
>>>> *
>>>> * <p>This function is defined in the SQL standard.
>>>> * {@code GROUPING_ID} is a non-standard synonym.
>>>> *
>>>> * <p>Some examples are in {@code agg.iq}.
>>>> */
>>>> 
>>>> The annotations above seems conflicts with other implementations like
>> Hive(
>>>> 
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>>>> )
>>>> 
>>>> Notice that: "The grouping function indicates whether an expression in a
>>>> GROUP BY clause is aggregated or not for a given row. The value 0
>>>> represents a column that is part of the grouping set, while the value 1
>>>> represents a column that is not part of the grouping set. "
>>>> 
>>>> 
>>>> It is clearly that 0 and 1 bit have different interpretation  between
>>>> annotations in Calcite and in Hive. And I did not figure out why...
>>>> 
>>>> Any feedback can give me on this would be highly appreciated.
>>>> 
>>>> Best regards!
>> 
>>

Re: Understanding annotations of SqlGroupingFunction

Reply via email to