Re: Understanding annotations of SqlGroupingFunction

XING JIN Tue, 21 Apr 2020 23:52:25 -0700

Hi Vineet ~
+1 on your analysis.
Checking below case in agg.iq. We can see that the behavior of GROUPING
function in Calcite is the same as Hive.


# GROUPING in SELECT clause of CUBE query
select deptno, job, count(*) as c, grouping(deptno) as d,
  grouping(job) j, grouping(deptno, job) as x
from "scott".emp
group by cube(deptno, job);
+--------+-----------+----+---+---+---+
| DEPTNO | JOB       | C  | D | J | X |
+--------+-----------+----+---+---+---+
|     10 | CLERK     |  1 | 0 | 0 | 0 |
|     10 | MANAGER   |  1 | 0 | 0 | 0 |
|     10 | PRESIDENT |  1 | 0 | 0 | 0 |
|     10 |           |  3 | 0 | 1 | 1 |
|     20 | ANALYST   |  2 | 0 | 0 | 0 |
|     20 | CLERK     |  2 | 0 | 0 | 0 |
|     20 | MANAGER   |  1 | 0 | 0 | 0 |
|     20 |           |  5 | 0 | 1 | 1 |
|     30 | CLERK     |  1 | 0 | 0 | 0 |
|     30 | MANAGER   |  1 | 0 | 0 | 0 |
|     30 | SALESMAN  |  4 | 0 | 0 | 0 |
|     30 |           |  6 | 0 | 1 | 1 |
|        | ANALYST   |  2 | 1 | 0 | 2 |
|        | CLERK     |  4 | 1 | 0 | 2 |
|        | MANAGER   |  3 | 1 | 0 | 2 |
|        | PRESIDENT |  1 | 1 | 0 | 2 |
|        | SALESMAN  |  4 | 1 | 0 | 2 |
|        |           | 14 | 1 | 1 | 3 |
+--------+-----------+----+---+---+---+

IMHO, we might rectify the doc of SqlGroupingFunction as below:

* 0 if both deptno and gender are being grouped,
* 1 if only deptno is being grouped,
* 2 if only gender is being groped,
* 3 if neither deptno nor gender are being grouped.


- Jin


Vineet G <vgargapa...@gmail.com> 于2020年4月22日周三 上午5:18写道：

> I expect that the user behavior for the GROUPING in both hive and calcite
> is same. It’s just the documentation which is a bit confusing.
> e.g.  comment line on grouping : if both deptno and gender are being
> grouped
>
> should really mean that the row which represents the grand total i.e
> without group by expression. will return 3.
>
> FYI Hive had its behavior fixed with
> https://issues.apache.org/jira/browse/HIVE-16102 <
> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
> GROUPING ID function.
>
> Vineet Garg
>
>
> > On Apr 21, 2020, at 10:42 AM, Julian Hyde <jhyde.apa...@gmail.com>
> wrote:
> >
> > Suppose we have one row that represents the total for department 10, and
> another that represents the grand total of all departments. Which row would
> we say that department is “grouped” (in Calcite’s parlance) or “aggregated”
> in (Hive’s parlance)?
> >
> > I find the terms confusing. It’s possible that Calcite has them “wrong”.
> >
> > I would simply run a query like
> >
> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
> >  FROM emp
> >  GROUP BY CUBE (deptno, job)
> >
> > and see whether Hive and Calcite return the same result.
> >
> >> On Apr 20, 2020, at 6:58 PM, ZZY <joa...@gmail.com> wrote:
> >>
> >> Hi, Hyde:
> >> It's confused me that some annotations in
> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
> >> /**
> >> * The {@code GROUPING} function.
> >> *
> >> * <p>Accepts 1 or more arguments.
> >> * Example: {@code GROUPING(deptno, gender)} returns
> >> * 3 if both deptno and gender are being grouped,
> >> * 2 if only deptno is being grouped,
> >> * 1 if only gender is being groped,
> >> * 0 if neither deptno nor gender are being grouped.
> >> *
> >> * <p>This function is defined in the SQL standard.
> >> * {@code GROUPING_ID} is a non-standard synonym.
> >> *
> >> * <p>Some examples are in {@code agg.iq}.
> >> */
> >>
> >> The annotations above seems conflicts with other implementations like
> Hive(
> >>
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
> >> )
> >>
> >> Notice that: "The grouping function indicates whether an expression in a
> >> GROUP BY clause is aggregated or not for a given row. The value 0
> >> represents a column that is part of the grouping set, while the value 1
> >> represents a column that is not part of the grouping set. "
> >>
> >>
> >> It is clearly that 0 and 1 bit have different interpretation  between
> >> annotations in Calcite and in Hive. And I did not figure out why...
> >>
> >> Any feedback can give me on this would be highly appreciated.
> >>
> >> Best regards!
>
>

Re: Understanding annotations of SqlGroupingFunction

Reply via email to