I expect that the user behavior for the GROUPING in both hive and calcite is 
same. It’s just the documentation which is a bit confusing.
e.g.  comment line on grouping : if both deptno and gender are being grouped

should really mean that the row which represents the grand total i.e  without 
group by expression. will return 3.

FYI Hive had its behavior fixed with 
https://issues.apache.org/jira/browse/HIVE-16102 
<https://issues.apache.org/jira/browse/HIVE-16102> and currently uses GROUPING 
ID function.

Vineet Garg


> On Apr 21, 2020, at 10:42 AM, Julian Hyde <jhyde.apa...@gmail.com> wrote:
> 
> Suppose we have one row that represents the total for department 10, and 
> another that represents the grand total of all departments. Which row would 
> we say that department is “grouped” (in Calcite’s parlance) or “aggregated” 
> in (Hive’s parlance)?
> 
> I find the terms confusing. It’s possible that Calcite has them “wrong”. 
> 
> I would simply run a query like
> 
>  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>  FROM emp
>  GROUP BY CUBE (deptno, job)
> 
> and see whether Hive and Calcite return the same result. 
> 
>> On Apr 20, 2020, at 6:58 PM, ZZY <joa...@gmail.com> wrote:
>> 
>> Hi, Hyde:
>> It's confused me that some annotations in
>> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>> /**
>> * The {@code GROUPING} function.
>> *
>> * <p>Accepts 1 or more arguments.
>> * Example: {@code GROUPING(deptno, gender)} returns
>> * 3 if both deptno and gender are being grouped,
>> * 2 if only deptno is being grouped,
>> * 1 if only gender is being groped,
>> * 0 if neither deptno nor gender are being grouped.
>> *
>> * <p>This function is defined in the SQL standard.
>> * {@code GROUPING_ID} is a non-standard synonym.
>> *
>> * <p>Some examples are in {@code agg.iq}.
>> */
>> 
>> The annotations above seems conflicts with other implementations like Hive(
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>> )
>> 
>> Notice that: "The grouping function indicates whether an expression in a
>> GROUP BY clause is aggregated or not for a given row. The value 0
>> represents a column that is part of the grouping set, while the value 1
>> represents a column that is not part of the grouping set. "
>> 
>> 
>> It is clearly that 0 and 1 bit have different interpretation  between
>> annotations in Calcite and in Hive. And I did not figure out why...
>> 
>> Any feedback can give me on this would be highly appreciated.
>> 
>> Best regards!

Reply via email to