Re: Understanding annotations of SqlGroupingFunction

XING JIN Wed, 22 Apr 2020 00:17:14 -0700

Filed a JIRA: https://issues.apache.org/jira/browse/CALCITE-3950


XING JIN <jinxing.co...@gmail.com> 于2020年4月22日周三 下午2:51写道：

> Hi Vineet ~
> +1 on your analysis.
> Checking below case in agg.iq. We can see that the behavior of GROUPING
> function in Calcite is the same as Hive.
>
> # GROUPING in SELECT clause of CUBE query
> select deptno, job, count(*) as c, grouping(deptno) as d,
>   grouping(job) j, grouping(deptno, job) as x
> from "scott".emp
> group by cube(deptno, job);
> +--------+-----------+----+---+---+---+
> | DEPTNO | JOB       | C  | D | J | X |
> +--------+-----------+----+---+---+---+
> |     10 | CLERK     |  1 | 0 | 0 | 0 |
> |     10 | MANAGER   |  1 | 0 | 0 | 0 |
> |     10 | PRESIDENT |  1 | 0 | 0 | 0 |
> |     10 |           |  3 | 0 | 1 | 1 |
> |     20 | ANALYST   |  2 | 0 | 0 | 0 |
> |     20 | CLERK     |  2 | 0 | 0 | 0 |
> |     20 | MANAGER   |  1 | 0 | 0 | 0 |
> |     20 |           |  5 | 0 | 1 | 1 |
> |     30 | CLERK     |  1 | 0 | 0 | 0 |
> |     30 | MANAGER   |  1 | 0 | 0 | 0 |
> |     30 | SALESMAN  |  4 | 0 | 0 | 0 |
> |     30 |           |  6 | 0 | 1 | 1 |
> |        | ANALYST   |  2 | 1 | 0 | 2 |
> |        | CLERK     |  4 | 1 | 0 | 2 |
> |        | MANAGER   |  3 | 1 | 0 | 2 |
> |        | PRESIDENT |  1 | 1 | 0 | 2 |
> |        | SALESMAN  |  4 | 1 | 0 | 2 |
> |        |           | 14 | 1 | 1 | 3 |
> +--------+-----------+----+---+---+---+
>
> IMHO, we might rectify the doc of SqlGroupingFunction as below:
>
> * 0 if both deptno and gender are being grouped,
> * 1 if only deptno is being grouped,
> * 2 if only gender is being groped,
> * 3 if neither deptno nor gender are being grouped.
>
>
> - Jin
>
>
> Vineet G <vgargapa...@gmail.com> 于2020年4月22日周三 上午5:18写道：
>
>> I expect that the user behavior for the GROUPING in both hive and calcite
>> is same. It’s just the documentation which is a bit confusing.
>> e.g.  comment line on grouping : if both deptno and gender are being
>> grouped
>>
>> should really mean that the row which represents the grand total i.e
>> without group by expression. will return 3.
>>
>> FYI Hive had its behavior fixed with
>> https://issues.apache.org/jira/browse/HIVE-16102 <
>> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
>> GROUPING ID function.
>>
>> Vineet Garg
>>
>>
>> > On Apr 21, 2020, at 10:42 AM, Julian Hyde <jhyde.apa...@gmail.com>
>> wrote:
>> >
>> > Suppose we have one row that represents the total for department 10,
>> and another that represents the grand total of all departments. Which row
>> would we say that department is “grouped” (in Calcite’s parlance) or
>> “aggregated” in (Hive’s parlance)?
>> >
>> > I find the terms confusing. It’s possible that Calcite has them
>> “wrong”.
>> >
>> > I would simply run a query like
>> >
>> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>> >  FROM emp
>> >  GROUP BY CUBE (deptno, job)
>> >
>> > and see whether Hive and Calcite return the same result.
>> >
>> >> On Apr 20, 2020, at 6:58 PM, ZZY <joa...@gmail.com> wrote:
>> >>
>> >> Hi, Hyde:
>> >> It's confused me that some annotations in
>> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>> >> /**
>> >> * The {@code GROUPING} function.
>> >> *
>> >> * <p>Accepts 1 or more arguments.
>> >> * Example: {@code GROUPING(deptno, gender)} returns
>> >> * 3 if both deptno and gender are being grouped,
>> >> * 2 if only deptno is being grouped,
>> >> * 1 if only gender is being groped,
>> >> * 0 if neither deptno nor gender are being grouped.
>> >> *
>> >> * <p>This function is defined in the SQL standard.
>> >> * {@code GROUPING_ID} is a non-standard synonym.
>> >> *
>> >> * <p>Some examples are in {@code agg.iq}.
>> >> */
>> >>
>> >> The annotations above seems conflicts with other implementations like
>> Hive(
>> >>
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>> >> )
>> >>
>> >> Notice that: "The grouping function indicates whether an expression in
>> a
>> >> GROUP BY clause is aggregated or not for a given row. The value 0
>> >> represents a column that is part of the grouping set, while the value 1
>> >> represents a column that is not part of the grouping set. "
>> >>
>> >>
>> >> It is clearly that 0 and 1 bit have different interpretation  between
>> >> annotations in Calcite and in Hive. And I did not figure out why...
>> >>
>> >> Any feedback can give me on this would be highly appreciated.
>> >>
>> >> Best regards!
>>
>>

Re: Understanding annotations of SqlGroupingFunction

Reply via email to