[jira] [Created] (CALCITE-3952) Improve SortRemoveRule to remove Sort based on rowcount

2020-04-22 Thread Vineet Garg (Jira)
Vineet Garg created CALCITE-3952:


 Summary: Improve SortRemoveRule to remove Sort based on rowcount
 Key: CALCITE-3952
 URL: https://issues.apache.org/jira/browse/CALCITE-3952
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Vineet Garg
Assignee: Vineet Garg


If a query is guaranteed to produce maximum one row it is safe to remove Sort 
(along with limit). 

Example:
{code:sql}
select count(*) cs from store_sales where ss_ext_sales_price > 100.00 order by 
cs limit 100
{code}

Although logically equivalent this can greatly benefit physical plans by 
removing extra operator and avoiding unnecessary data transfer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Deprecate grouped window functions

2020-04-22 Thread Rui Wang
Made a mistake on the example above, and update it as follows:

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
GROUP BY product_id, window_start

On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:

> Hi community,
>
> I want to kick off a discussion about deprecating grouped window functions
> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
> table function windowing is TUMBLE support is checked in. HOP and SESSION
> support is likely to be merged in 1.23.0.
>
> A briefly example of two different windowing syntax:
>
> // Grouped window functions.
> SELECT
>product_id, count(*), TUMBLE_START() as window_start
> FROM order
> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
> fixed window size.
>
> // Table function windowing syntax.
> SELECT
> product_id, count(*), window_start
> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> GROUP BY product_id
>
> I am giving a short, selective comparison as the following:
>
> The places that table function windowing behaves better
> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
> For example, one use case is for each hour, apply a JOIN on two streams. In
> this case, no GROUP BY is needed.
> 2) grouped window functions allow multiple calls in GROUP BY. For example,
> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
> is not wrong, but it is an illegal query.
> 3) Calcite includes an Enumerable implementation of table function
> windowing, while grouped window functions do not have that.
>
>
> The places that table function windowing behaves worse
> 1) table function windowing adds "window_start", "window_end" into table
> directly, which increases the volume of data (number of rows *
> sizeof(timestamp) * 2).
>
>
> I want to focus on discussing two questions in this thread:
> 1) Do people support deprecating grouped window functions?
> 2) By which version people prefer to make grouped window functions
> completely removed?(if 1) is yes).
>
>
>
> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
>
>
> -Rui
>


[DISCUSS] Deprecate grouped window functions

2020-04-22 Thread Rui Wang
Hi community,

I want to kick off a discussion about deprecating grouped window functions
(GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
table function windowing is TUMBLE support is checked in. HOP and SESSION
support is likely to be merged in 1.23.0.

A briefly example of two different windowing syntax:

// Grouped window functions.
SELECT
   product_id, count(*), TUMBLE_START() as window_start
FROM order
GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
fixed window size.

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
GROUP BY product_id

I am giving a short, selective comparison as the following:

The places that table function windowing behaves better
1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
For example, one use case is for each hour, apply a JOIN on two streams. In
this case, no GROUP BY is needed.
2) grouped window functions allow multiple calls in GROUP BY. For example,
from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
is not wrong, but it is an illegal query.
3) Calcite includes an Enumerable implementation of table function
windowing, while grouped window functions do not have that.


The places that table function windowing behaves worse
1) table function windowing adds "window_start", "window_end" into table
directly, which increases the volume of data (number of rows *
sizeof(timestamp) * 2).


I want to focus on discussing two questions in this thread:
1) Do people support deprecating grouped window functions?
2) By which version people prefer to make grouped window functions
completely removed?(if 1) is yes).



[1]: https://jira.apache.org/jira/browse/CALCITE-3271


-Rui


Re: Stored Proc to Relational Expression

2020-04-22 Thread Stamatis Zampetakis
I assume that by GitHub page you mean [1].
For quite some time linq4j is part of calcite [2] so I think that [1] is
not maintained anymore.
The backlog in [1] also looks a bit obsolete.
Julian can correct me if I am wrong.

In my previous email, I was talking about relational expressions (i.e.,
RelNode [3]) and not Expression [4] and Statement [5].
Furthermore, when I mentioned translating expressions to SQL what I had in
mind is RelToSqlConverter [6].
Apologies for the confusion.

I believe LINQ was introduced to bring procedural code and SQL closer and
not use the one to replace the other.

Best,
Stamatis

[1] https://github.com/julianhyde/linq4j
[2] https://github.com/apache/calcite/tree/master/linq4j
[3]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/RelNode.java
[4]
https://github.com/apache/calcite/blob/master/linq4j/src/main/java/org/apache/calcite/linq4j/tree/Expression.java
[5]
https://github.com/apache/calcite/blob/master/linq4j/src/main/java/org/apache/calcite/linq4j/tree/Statement.java
[6]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

On Wed, Apr 22, 2020 at 8:54 AM Ravi Kapoor  wrote:

> Yeah Stamatis, I know that as I can see the github page for linq4j that
> lots of task is in Backlog and Link-To-SQL Provider is one of them
> But the intention of the project is to make java objects as Queryable talk
> to DB.
>
> Expressions and various Statement used in linq4j is to generate the
> different constructs like where filter, join to talk the back-end systems
> as Sql Queries.
>
> But what about the normal While loop and If then else statements which is
> not used in the Query-able objects .
> Example could be below:
>
>
> LOOP
>   SET heads = RAND() < 0.5;
>   IF heads THEN
> SELECT 'Heads!';
> SET heads_count = heads_count + 1;
>   ELSE
> SELECT 'Tails!';
> BREAK;
>   END IF;
> END LOOP;
> SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');
>
>
> How  linq4j going to convert normal while and If then else java expressions
> to sql query if they are not involved in computation of sql constructs like
> in lambda expression of filter.
>
> In one of the task in backlog I could see:
> *In the prototype LINQ-to-SQL provider, write a simple rule to recognize a
> select list and where clause and push them down to SQL.*
>
> I believe it was never intended for procedural code right?
>
> Thanks,
> Ravi
>
>
> On Wed, Apr 22, 2020 at 2:43 AM Stamatis Zampetakis 
> wrote:
>
> > Hi Ravi,
> >
> > As Julian already mentioned, we are quite far from what you would like to
> > achieve.
> >
> > Nevertheless, I would like to mention that you can model certain kind of
> > loops and recursion using the RepeatUnion [1] and Spool [2] relational
> > expressions. Note that these APIs are experimental.
> >
> > Moreover, if your end goal is to push the computation in a DBMS, I have
> to
> > warn you that the code to translate these expressions back to SQL is not
> > there yet.
> >
> > Best,
> > Stamatis
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/RepeatUnion.java
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Spool.java
> >
> > On Mon, Apr 20, 2020 at 8:08 PM Julian Hyde  wrote:
> >
> > > Calcite relational expressions can represent SELECT, INSERT etc. but
> not
> > > procedural code. It’s a direction we could consider going.
> > >
> > > RexProgram is the closest thing we currently have to procedural code in
> > > the algebra - single assignment of variables, use of variables in
> > > expressions assigning to other variables - but it is a long way short
> > > because there are no loops.
> > >
> > > > On Apr 19, 2020, at 12:47 PM, Ravi Kapoor 
> > > wrote:
> > > >
> > > > Hi Team,
> > > >
> > > > I have my use where I need to convert my dialect specific stored
> > > procedure
> > > > constructs like while loop, If then else to Rel expression
> > > >
> > > > Basically this can contain control flow statements like below
> > > >
> > > > DECLARE heads BOOL;
> > > > DECLARE heads_count INT64 DEFAULT 0;
> > > > LOOP
> > > >  SET heads = RAND() < 0.5;
> > > >  IF heads THEN
> > > >SELECT 'Heads!';
> > > >SET heads_count = heads_count + 1;
> > > >  ELSE
> > > >SELECT 'Tails!';
> > > >BREAK;
> > > >  END IF;
> > > > END LOOP;
> > > > SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');
> > > >
> > > >
> > > > I can create a Java AST model from the linq4j provided by calcite
> > however
> > > > this is only going to generate Java Result and I believe its only
> used
> > by
> > > > the calcite for relational expressions of enumerable calling
> convention
> > > > which is used by adapters which does not support core relational
> > > operations
> > > > right?
> > > >
> > > > Is there a way I can convert the stored proc constructs into some
> 

[jira] [Created] (CALCITE-3951) Support different comparison based on collation

2020-04-22 Thread Ruben Q L (Jira)
Ruben Q L created CALCITE-3951:
--

 Summary: Support different comparison based on collation
 Key: CALCITE-3951
 URL: https://issues.apache.org/jira/browse/CALCITE-3951
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Ruben Q L
Assignee: Ruben Q L


Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
etc. However, we cannot specify on a certain collation that e.g. a string field 
should use case insensitive comparison. The goal of this ticket is to evolve 
SqlCollation to support that, and adapt the corresponding classes to use that 
(optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Understanding annotations of SqlGroupingFunction

2020-04-22 Thread XING JIN
Filed a JIRA: https://issues.apache.org/jira/browse/CALCITE-3950

XING JIN  于2020年4月22日周三 下午2:51写道:

> Hi Vineet ~
> +1 on your analysis.
> Checking below case in agg.iq. We can see that the behavior of GROUPING
> function in Calcite is the same as Hive.
>
> # GROUPING in SELECT clause of CUBE query
> select deptno, job, count(*) as c, grouping(deptno) as d,
>   grouping(job) j, grouping(deptno, job) as x
> from "scott".emp
> group by cube(deptno, job);
> ++---++---+---+---+
> | DEPTNO | JOB   | C  | D | J | X |
> ++---++---+---+---+
> | 10 | CLERK |  1 | 0 | 0 | 0 |
> | 10 | MANAGER   |  1 | 0 | 0 | 0 |
> | 10 | PRESIDENT |  1 | 0 | 0 | 0 |
> | 10 |   |  3 | 0 | 1 | 1 |
> | 20 | ANALYST   |  2 | 0 | 0 | 0 |
> | 20 | CLERK |  2 | 0 | 0 | 0 |
> | 20 | MANAGER   |  1 | 0 | 0 | 0 |
> | 20 |   |  5 | 0 | 1 | 1 |
> | 30 | CLERK |  1 | 0 | 0 | 0 |
> | 30 | MANAGER   |  1 | 0 | 0 | 0 |
> | 30 | SALESMAN  |  4 | 0 | 0 | 0 |
> | 30 |   |  6 | 0 | 1 | 1 |
> || ANALYST   |  2 | 1 | 0 | 2 |
> || CLERK |  4 | 1 | 0 | 2 |
> || MANAGER   |  3 | 1 | 0 | 2 |
> || PRESIDENT |  1 | 1 | 0 | 2 |
> || SALESMAN  |  4 | 1 | 0 | 2 |
> ||   | 14 | 1 | 1 | 3 |
> ++---++---+---+---+
>
> IMHO, we might rectify the doc of SqlGroupingFunction as below:
>
> * 0 if both deptno and gender are being grouped,
> * 1 if only deptno is being grouped,
> * 2 if only gender is being groped,
> * 3 if neither deptno nor gender are being grouped.
>
>
> - Jin
>
>
> Vineet G  于2020年4月22日周三 上午5:18写道:
>
>> I expect that the user behavior for the GROUPING in both hive and calcite
>> is same. It’s just the documentation which is a bit confusing.
>> e.g.  comment line on grouping : if both deptno and gender are being
>> grouped
>>
>> should really mean that the row which represents the grand total i.e
>> without group by expression. will return 3.
>>
>> FYI Hive had its behavior fixed with
>> https://issues.apache.org/jira/browse/HIVE-16102 <
>> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
>> GROUPING ID function.
>>
>> Vineet Garg
>>
>>
>> > On Apr 21, 2020, at 10:42 AM, Julian Hyde 
>> wrote:
>> >
>> > Suppose we have one row that represents the total for department 10,
>> and another that represents the grand total of all departments. Which row
>> would we say that department is “grouped” (in Calcite’s parlance) or
>> “aggregated” in (Hive’s parlance)?
>> >
>> > I find the terms confusing. It’s possible that Calcite has them
>> “wrong”.
>> >
>> > I would simply run a query like
>> >
>> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>> >  FROM emp
>> >  GROUP BY CUBE (deptno, job)
>> >
>> > and see whether Hive and Calcite return the same result.
>> >
>> >> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
>> >>
>> >> Hi, Hyde:
>> >> It's confused me that some annotations in
>> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>> >> /**
>> >> * The {@code GROUPING} function.
>> >> *
>> >> * Accepts 1 or more arguments.
>> >> * Example: {@code GROUPING(deptno, gender)} returns
>> >> * 3 if both deptno and gender are being grouped,
>> >> * 2 if only deptno is being grouped,
>> >> * 1 if only gender is being groped,
>> >> * 0 if neither deptno nor gender are being grouped.
>> >> *
>> >> * This function is defined in the SQL standard.
>> >> * {@code GROUPING_ID} is a non-standard synonym.
>> >> *
>> >> * Some examples are in {@code agg.iq}.
>> >> */
>> >>
>> >> The annotations above seems conflicts with other implementations like
>> Hive(
>> >>
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>> >> )
>> >>
>> >> Notice that: "The grouping function indicates whether an expression in
>> a
>> >> GROUP BY clause is aggregated or not for a given row. The value 0
>> >> represents a column that is part of the grouping set, while the value 1
>> >> represents a column that is not part of the grouping set. "
>> >>
>> >>
>> >> It is clearly that 0 and 1 bit have different interpretation  between
>> >> annotations in Calcite and in Hive. And I did not figure out why...
>> >>
>> >> Any feedback can give me on this would be highly appreciated.
>> >>
>> >> Best regards!
>>
>>


Re: Stored Proc to Relational Expression

2020-04-22 Thread Ravi Kapoor
Yeah Stamatis, I know that as I can see the github page for linq4j that
lots of task is in Backlog and Link-To-SQL Provider is one of them
But the intention of the project is to make java objects as Queryable talk
to DB.

Expressions and various Statement used in linq4j is to generate the
different constructs like where filter, join to talk the back-end systems
as Sql Queries.

But what about the normal While loop and If then else statements which is
not used in the Query-able objects .
Example could be below:


LOOP
  SET heads = RAND() < 0.5;
  IF heads THEN
SELECT 'Heads!';
SET heads_count = heads_count + 1;
  ELSE
SELECT 'Tails!';
BREAK;
  END IF;
END LOOP;
SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');


How  linq4j going to convert normal while and If then else java expressions
to sql query if they are not involved in computation of sql constructs like
in lambda expression of filter.

In one of the task in backlog I could see:
*In the prototype LINQ-to-SQL provider, write a simple rule to recognize a
select list and where clause and push them down to SQL.*

I believe it was never intended for procedural code right?

Thanks,
Ravi


On Wed, Apr 22, 2020 at 2:43 AM Stamatis Zampetakis 
wrote:

> Hi Ravi,
>
> As Julian already mentioned, we are quite far from what you would like to
> achieve.
>
> Nevertheless, I would like to mention that you can model certain kind of
> loops and recursion using the RepeatUnion [1] and Spool [2] relational
> expressions. Note that these APIs are experimental.
>
> Moreover, if your end goal is to push the computation in a DBMS, I have to
> warn you that the code to translate these expressions back to SQL is not
> there yet.
>
> Best,
> Stamatis
>
> [1]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/RepeatUnion.java
> [2]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Spool.java
>
> On Mon, Apr 20, 2020 at 8:08 PM Julian Hyde  wrote:
>
> > Calcite relational expressions can represent SELECT, INSERT etc. but not
> > procedural code. It’s a direction we could consider going.
> >
> > RexProgram is the closest thing we currently have to procedural code in
> > the algebra - single assignment of variables, use of variables in
> > expressions assigning to other variables - but it is a long way short
> > because there are no loops.
> >
> > > On Apr 19, 2020, at 12:47 PM, Ravi Kapoor 
> > wrote:
> > >
> > > Hi Team,
> > >
> > > I have my use where I need to convert my dialect specific stored
> > procedure
> > > constructs like while loop, If then else to Rel expression
> > >
> > > Basically this can contain control flow statements like below
> > >
> > > DECLARE heads BOOL;
> > > DECLARE heads_count INT64 DEFAULT 0;
> > > LOOP
> > >  SET heads = RAND() < 0.5;
> > >  IF heads THEN
> > >SELECT 'Heads!';
> > >SET heads_count = heads_count + 1;
> > >  ELSE
> > >SELECT 'Tails!';
> > >BREAK;
> > >  END IF;
> > > END LOOP;
> > > SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');
> > >
> > >
> > > I can create a Java AST model from the linq4j provided by calcite
> however
> > > this is only going to generate Java Result and I believe its only used
> by
> > > the calcite for relational expressions of enumerable calling convention
> > > which is used by adapters which does not support core relational
> > operations
> > > right?
> > >
> > > Is there a way I can convert the stored proc constructs into some
> > canonical
> > > form like Rel Tree and back to Stored proc of target dialect.
> > > --
> > >
> > > Thanks,
> > > Ravi
> >
> >
>


-- 
Thanks,
Ravi Kapoor
+91-9818764564
kapoorrav...@gmail.com


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-22 Thread Danny Chan
> Is there any recommended approach to make that happen smoothly besides
coding and testing work? We need to be aware that the new planner might be
co-exist with VolcanoPlanner for 5 or more years, or even never replace
VolcanoPlanner.

If that is true, i might say the new planner is probably with a not that
good design, we expect to see in advance for what cases/reasons user has
the reason to keep the old VolcanoPlanner and we *must* give a solution for
those problems in the new design.

I was expecting that migrating to a new planner would at least take 1 year
for developing, if that is true, modifying directly based on current
planner means for the near future 3~4 versions Calcite, there would bring
in huge plan changes/bugs for each release which i believe all the users of
Calcite don't want to see. And on one can guarantee that modifying directly
can keep good stability and compatibility, only the test set do.

>From the experience of Alibaba Blink planner which has contributed to
Apache Flink, yes, the old/new planner would co-exist at least for 2 years.
For the reasons that the new and old planner has different ability in some
corner cases.

>From my point of view, we should at least:
- Give a convincing test set for the new planner that makes us believe the
new planner is stable and powerful enough. I mean obviously the current
rule tests are far away from enough to support the new planner
- We should give a more detailed design doc about the new planner,
especially about the interfaces changes and any change that would bring in
the compatibility problem. Then we can make more accurate decision how much
work the new planner would bring in, until then, we can decide if switch to
a pure new planner development is a good idea or modify the existing one.


Haisheng Yuan  于2020年4月22日周三 上午9:45写道:

> Hi Andrii,
>
> > Obviously, from what is written here, I could guess that this would
> require me to change my physical planning rules, even if only by
> implementing a marker interface.
> You don't need to change your physical rules, it will be treated as equal
> as logical rules and be applied together with the real logical rules, no
> more logical/physical rules difference. This is also how current
> VolcanoPlanner works.
>
> > I don't want you to think that I somehow resent the changes you are
> pushing.
> Don't get me wrong. I am seriously thinking of revert these changes, since
> most people like the idea of adding new planner, why don't we make all the
> plan changes in the new planner, instead of forcing people changing test
> cases for the code changes that they might not need in VolcanoPlanner
> during upgrade.
>
> I didn't intend to replace VolcanoPlanner, thought just change the search
> strategy and add trait derivation mechanism, because most of the code in
> VolcanoPlanner can be reused. But since many agree to add new planner and
> replace VolcanoPlanner as the final goal, I won't be against most people's
> decision.
>
> Is there any recommended approach to make that happen smoothly besides
> coding and testing work? We need to be aware that the new planner might be
> co-exist with VolcanoPlanner for 5 or more years, or even never replace
> VolcanoPlanner.
>
> More thoughts are welcome.
>
> Haisheng
>
> On 2020/04/21 19:56:25, Андрей Цвелодуб  wrote:
> > Hello Haisheng,
> >
> > > To keep backward compatibility, all the un-marked rules will be treated
> > as logical rules, except rules that uses AbstractConverter as rule
> operand,
> > these rules still need to applied top-down, or random order.
> > Obviously, from what is written here, I could guess that this would
> require
> > me to change my physical planning rules, even if only by implementing a
> > marker interface. I am not saying this is a bad thing, but this is a
> thing
> > that should be communicated and planned ahead in case the VolcanoPlanner
> is
> > modified.
> >
> > > Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753,
> > because they will cause another tons of plan changes.
> > I see you are still bitter due to all the discussions on this list
> lately,
> > I'm sorry. I don't want you to think that I somehow resent the changes
> you
> > are pushing, au contraire I support them and would be happy to help if I
> > can. I just want the process of these changes to be executed in the best
> > possible way.
> > As I see there are already several opinions in this thread that basically
> > align with what I am saying, so I guess I am not the crazy guy running
> > around and yelling "the end is nigh!".
> >
> > Thank you for taking these mumbled thoughts into account.
> >
> > Bestest Regards,
> > Andrii Tsvielodub
> >
> > On Tue, 21 Apr 2020 at 21:08, Haisheng Yuan  wrote:
> >
> > > Hi Andrii,
> > >
> > > > I guess changing the planner would lead to changes in tons of rules
> and
> > > even more tests.
> > > Obviously you didn't read through my email. You are not required to do
> any
> > > changes to your rule if you 

Re: Understanding annotations of SqlGroupingFunction

2020-04-22 Thread XING JIN
Hi Vineet ~
+1 on your analysis.
Checking below case in agg.iq. We can see that the behavior of GROUPING
function in Calcite is the same as Hive.

# GROUPING in SELECT clause of CUBE query
select deptno, job, count(*) as c, grouping(deptno) as d,
  grouping(job) j, grouping(deptno, job) as x
from "scott".emp
group by cube(deptno, job);
++---++---+---+---+
| DEPTNO | JOB   | C  | D | J | X |
++---++---+---+---+
| 10 | CLERK |  1 | 0 | 0 | 0 |
| 10 | MANAGER   |  1 | 0 | 0 | 0 |
| 10 | PRESIDENT |  1 | 0 | 0 | 0 |
| 10 |   |  3 | 0 | 1 | 1 |
| 20 | ANALYST   |  2 | 0 | 0 | 0 |
| 20 | CLERK |  2 | 0 | 0 | 0 |
| 20 | MANAGER   |  1 | 0 | 0 | 0 |
| 20 |   |  5 | 0 | 1 | 1 |
| 30 | CLERK |  1 | 0 | 0 | 0 |
| 30 | MANAGER   |  1 | 0 | 0 | 0 |
| 30 | SALESMAN  |  4 | 0 | 0 | 0 |
| 30 |   |  6 | 0 | 1 | 1 |
|| ANALYST   |  2 | 1 | 0 | 2 |
|| CLERK |  4 | 1 | 0 | 2 |
|| MANAGER   |  3 | 1 | 0 | 2 |
|| PRESIDENT |  1 | 1 | 0 | 2 |
|| SALESMAN  |  4 | 1 | 0 | 2 |
||   | 14 | 1 | 1 | 3 |
++---++---+---+---+

IMHO, we might rectify the doc of SqlGroupingFunction as below:

* 0 if both deptno and gender are being grouped,
* 1 if only deptno is being grouped,
* 2 if only gender is being groped,
* 3 if neither deptno nor gender are being grouped.


- Jin


Vineet G  于2020年4月22日周三 上午5:18写道:

> I expect that the user behavior for the GROUPING in both hive and calcite
> is same. It’s just the documentation which is a bit confusing.
> e.g.  comment line on grouping : if both deptno and gender are being
> grouped
>
> should really mean that the row which represents the grand total i.e
> without group by expression. will return 3.
>
> FYI Hive had its behavior fixed with
> https://issues.apache.org/jira/browse/HIVE-16102 <
> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
> GROUPING ID function.
>
> Vineet Garg
>
>
> > On Apr 21, 2020, at 10:42 AM, Julian Hyde 
> wrote:
> >
> > Suppose we have one row that represents the total for department 10, and
> another that represents the grand total of all departments. Which row would
> we say that department is “grouped” (in Calcite’s parlance) or “aggregated”
> in (Hive’s parlance)?
> >
> > I find the terms confusing. It’s possible that Calcite has them “wrong”.
> >
> > I would simply run a query like
> >
> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
> >  FROM emp
> >  GROUP BY CUBE (deptno, job)
> >
> > and see whether Hive and Calcite return the same result.
> >
> >> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
> >>
> >> Hi, Hyde:
> >> It's confused me that some annotations in
> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
> >> /**
> >> * The {@code GROUPING} function.
> >> *
> >> * Accepts 1 or more arguments.
> >> * Example: {@code GROUPING(deptno, gender)} returns
> >> * 3 if both deptno and gender are being grouped,
> >> * 2 if only deptno is being grouped,
> >> * 1 if only gender is being groped,
> >> * 0 if neither deptno nor gender are being grouped.
> >> *
> >> * This function is defined in the SQL standard.
> >> * {@code GROUPING_ID} is a non-standard synonym.
> >> *
> >> * Some examples are in {@code agg.iq}.
> >> */
> >>
> >> The annotations above seems conflicts with other implementations like
> Hive(
> >>
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
> >> )
> >>
> >> Notice that: "The grouping function indicates whether an expression in a
> >> GROUP BY clause is aggregated or not for a given row. The value 0
> >> represents a column that is part of the grouping set, while the value 1
> >> represents a column that is not part of the grouping set. "
> >>
> >>
> >> It is clearly that 0 and 1 bit have different interpretation  between
> >> annotations in Calcite and in Hive. And I did not figure out why...
> >>
> >> Any feedback can give me on this would be highly appreciated.
> >>
> >> Best regards!
>
>