Hi Vineet ~ +1 on your analysis. Checking below case in agg.iq. We can see that the behavior of GROUPING function in Calcite is the same as Hive.
# GROUPING in SELECT clause of CUBE query select deptno, job, count(*) as c, grouping(deptno) as d, grouping(job) j, grouping(deptno, job) as x from "scott".emp group by cube(deptno, job); +--------+-----------+----+---+---+---+ | DEPTNO | JOB | C | D | J | X | +--------+-----------+----+---+---+---+ | 10 | CLERK | 1 | 0 | 0 | 0 | | 10 | MANAGER | 1 | 0 | 0 | 0 | | 10 | PRESIDENT | 1 | 0 | 0 | 0 | | 10 | | 3 | 0 | 1 | 1 | | 20 | ANALYST | 2 | 0 | 0 | 0 | | 20 | CLERK | 2 | 0 | 0 | 0 | | 20 | MANAGER | 1 | 0 | 0 | 0 | | 20 | | 5 | 0 | 1 | 1 | | 30 | CLERK | 1 | 0 | 0 | 0 | | 30 | MANAGER | 1 | 0 | 0 | 0 | | 30 | SALESMAN | 4 | 0 | 0 | 0 | | 30 | | 6 | 0 | 1 | 1 | | | ANALYST | 2 | 1 | 0 | 2 | | | CLERK | 4 | 1 | 0 | 2 | | | MANAGER | 3 | 1 | 0 | 2 | | | PRESIDENT | 1 | 1 | 0 | 2 | | | SALESMAN | 4 | 1 | 0 | 2 | | | | 14 | 1 | 1 | 3 | +--------+-----------+----+---+---+---+ IMHO, we might rectify the doc of SqlGroupingFunction as below: * 0 if both deptno and gender are being grouped, * 1 if only deptno is being grouped, * 2 if only gender is being groped, * 3 if neither deptno nor gender are being grouped. - Jin Vineet G <vgargapa...@gmail.com> 于2020年4月22日周三 上午5:18写道: > I expect that the user behavior for the GROUPING in both hive and calcite > is same. It’s just the documentation which is a bit confusing. > e.g. comment line on grouping : if both deptno and gender are being > grouped > > should really mean that the row which represents the grand total i.e > without group by expression. will return 3. > > FYI Hive had its behavior fixed with > https://issues.apache.org/jira/browse/HIVE-16102 < > https://issues.apache.org/jira/browse/HIVE-16102> and currently uses > GROUPING ID function. > > Vineet Garg > > > > On Apr 21, 2020, at 10:42 AM, Julian Hyde <jhyde.apa...@gmail.com> > wrote: > > > > Suppose we have one row that represents the total for department 10, and > another that represents the grand total of all departments. Which row would > we say that department is “grouped” (in Calcite’s parlance) or “aggregated” > in (Hive’s parlance)? > > > > I find the terms confusing. It’s possible that Calcite has them “wrong”. > > > > I would simply run a query like > > > > SELECT deptno, job, GROUPING(deptno, job), COUNT(*) > > FROM emp > > GROUP BY CUBE (deptno, job) > > > > and see whether Hive and Calcite return the same result. > > > >> On Apr 20, 2020, at 6:58 PM, ZZY <joa...@gmail.com> wrote: > >> > >> Hi, Hyde: > >> It's confused me that some annotations in > >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) : > >> /** > >> * The {@code GROUPING} function. > >> * > >> * <p>Accepts 1 or more arguments. > >> * Example: {@code GROUPING(deptno, gender)} returns > >> * 3 if both deptno and gender are being grouped, > >> * 2 if only deptno is being grouped, > >> * 1 if only gender is being groped, > >> * 0 if neither deptno nor gender are being grouped. > >> * > >> * <p>This function is defined in the SQL standard. > >> * {@code GROUPING_ID} is a non-standard synonym. > >> * > >> * <p>Some examples are in {@code agg.iq}. > >> */ > >> > >> The annotations above seems conflicts with other implementations like > Hive( > >> > https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction > >> ) > >> > >> Notice that: "The grouping function indicates whether an expression in a > >> GROUP BY clause is aggregated or not for a given row. The value 0 > >> represents a column that is part of the grouping set, while the value 1 > >> represents a column that is not part of the grouping set. " > >> > >> > >> It is clearly that 0 and 1 bit have different interpretation between > >> annotations in Calcite and in Hive. And I did not figure out why... > >> > >> Any feedback can give me on this would be highly appreciated. > >> > >> Best regards! > >