Filed a JIRA: https://issues.apache.org/jira/browse/CALCITE-3950
XING JIN <jinxing.co...@gmail.com> 于2020年4月22日周三 下午2:51写道: > Hi Vineet ~ > +1 on your analysis. > Checking below case in agg.iq. We can see that the behavior of GROUPING > function in Calcite is the same as Hive. > > # GROUPING in SELECT clause of CUBE query > select deptno, job, count(*) as c, grouping(deptno) as d, > grouping(job) j, grouping(deptno, job) as x > from "scott".emp > group by cube(deptno, job); > +--------+-----------+----+---+---+---+ > | DEPTNO | JOB | C | D | J | X | > +--------+-----------+----+---+---+---+ > | 10 | CLERK | 1 | 0 | 0 | 0 | > | 10 | MANAGER | 1 | 0 | 0 | 0 | > | 10 | PRESIDENT | 1 | 0 | 0 | 0 | > | 10 | | 3 | 0 | 1 | 1 | > | 20 | ANALYST | 2 | 0 | 0 | 0 | > | 20 | CLERK | 2 | 0 | 0 | 0 | > | 20 | MANAGER | 1 | 0 | 0 | 0 | > | 20 | | 5 | 0 | 1 | 1 | > | 30 | CLERK | 1 | 0 | 0 | 0 | > | 30 | MANAGER | 1 | 0 | 0 | 0 | > | 30 | SALESMAN | 4 | 0 | 0 | 0 | > | 30 | | 6 | 0 | 1 | 1 | > | | ANALYST | 2 | 1 | 0 | 2 | > | | CLERK | 4 | 1 | 0 | 2 | > | | MANAGER | 3 | 1 | 0 | 2 | > | | PRESIDENT | 1 | 1 | 0 | 2 | > | | SALESMAN | 4 | 1 | 0 | 2 | > | | | 14 | 1 | 1 | 3 | > +--------+-----------+----+---+---+---+ > > IMHO, we might rectify the doc of SqlGroupingFunction as below: > > * 0 if both deptno and gender are being grouped, > * 1 if only deptno is being grouped, > * 2 if only gender is being groped, > * 3 if neither deptno nor gender are being grouped. > > > - Jin > > > Vineet G <vgargapa...@gmail.com> 于2020年4月22日周三 上午5:18写道: > >> I expect that the user behavior for the GROUPING in both hive and calcite >> is same. It’s just the documentation which is a bit confusing. >> e.g. comment line on grouping : if both deptno and gender are being >> grouped >> >> should really mean that the row which represents the grand total i.e >> without group by expression. will return 3. >> >> FYI Hive had its behavior fixed with >> https://issues.apache.org/jira/browse/HIVE-16102 < >> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses >> GROUPING ID function. >> >> Vineet Garg >> >> >> > On Apr 21, 2020, at 10:42 AM, Julian Hyde <jhyde.apa...@gmail.com> >> wrote: >> > >> > Suppose we have one row that represents the total for department 10, >> and another that represents the grand total of all departments. Which row >> would we say that department is “grouped” (in Calcite’s parlance) or >> “aggregated” in (Hive’s parlance)? >> > >> > I find the terms confusing. It’s possible that Calcite has them >> “wrong”. >> > >> > I would simply run a query like >> > >> > SELECT deptno, job, GROUPING(deptno, job), COUNT(*) >> > FROM emp >> > GROUP BY CUBE (deptno, job) >> > >> > and see whether Hive and Calcite return the same result. >> > >> >> On Apr 20, 2020, at 6:58 PM, ZZY <joa...@gmail.com> wrote: >> >> >> >> Hi, Hyde: >> >> It's confused me that some annotations in >> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) : >> >> /** >> >> * The {@code GROUPING} function. >> >> * >> >> * <p>Accepts 1 or more arguments. >> >> * Example: {@code GROUPING(deptno, gender)} returns >> >> * 3 if both deptno and gender are being grouped, >> >> * 2 if only deptno is being grouped, >> >> * 1 if only gender is being groped, >> >> * 0 if neither deptno nor gender are being grouped. >> >> * >> >> * <p>This function is defined in the SQL standard. >> >> * {@code GROUPING_ID} is a non-standard synonym. >> >> * >> >> * <p>Some examples are in {@code agg.iq}. >> >> */ >> >> >> >> The annotations above seems conflicts with other implementations like >> Hive( >> >> >> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction >> >> ) >> >> >> >> Notice that: "The grouping function indicates whether an expression in >> a >> >> GROUP BY clause is aggregated or not for a given row. The value 0 >> >> represents a column that is part of the grouping set, while the value 1 >> >> represents a column that is not part of the grouping set. " >> >> >> >> >> >> It is clearly that 0 and 1 bit have different interpretation between >> >> annotations in Calcite and in Hive. And I did not figure out why... >> >> >> >> Any feedback can give me on this would be highly appreciated. >> >> >> >> Best regards! >> >>