[jira] [Created] (CALCITE-3952) Improve SortRemoveRule to remove Sort based on rowcount

2020-04-22 Thread Vineet Garg (Jira)
Vineet Garg created CALCITE-3952:


 Summary: Improve SortRemoveRule to remove Sort based on rowcount
 Key: CALCITE-3952
 URL: https://issues.apache.org/jira/browse/CALCITE-3952
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Vineet Garg
Assignee: Vineet Garg


If a query is guaranteed to produce maximum one row it is safe to remove Sort 
(along with limit). 

Example:
{code:sql}
select count(*) cs from store_sales where ss_ext_sales_price > 100.00 order by 
cs limit 100
{code}

Although logically equivalent this can greatly benefit physical plans by 
removing extra operator and avoiding unnecessary data transfer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Deprecate grouped window functions

2020-04-22 Thread Rui Wang
Made a mistake on the example above, and update it as follows:

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
GROUP BY product_id, window_start

On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:

> Hi community,
>
> I want to kick off a discussion about deprecating grouped window functions
> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
> table function windowing is TUMBLE support is checked in. HOP and SESSION
> support is likely to be merged in 1.23.0.
>
> A briefly example of two different windowing syntax:
>
> // Grouped window functions.
> SELECT
>product_id, count(*), TUMBLE_START() as window_start
> FROM order
> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
> fixed window size.
>
> // Table function windowing syntax.
> SELECT
> product_id, count(*), window_start
> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> GROUP BY product_id
>
> I am giving a short, selective comparison as the following:
>
> The places that table function windowing behaves better
> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
> For example, one use case is for each hour, apply a JOIN on two streams. In
> this case, no GROUP BY is needed.
> 2) grouped window functions allow multiple calls in GROUP BY. For example,
> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
> is not wrong, but it is an illegal query.
> 3) Calcite includes an Enumerable implementation of table function
> windowing, while grouped window functions do not have that.
>
>
> The places that table function windowing behaves worse
> 1) table function windowing adds "window_start", "window_end" into table
> directly, which increases the volume of data (number of rows *
> sizeof(timestamp) * 2).
>
>
> I want to focus on discussing two questions in this thread:
> 1) Do people support deprecating grouped window functions?
> 2) By which version people prefer to make grouped window functions
> completely removed?(if 1) is yes).
>
>
>
> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
>
>
> -Rui
>


[DISCUSS] Deprecate grouped window functions

2020-04-22 Thread Rui Wang
Hi community,

I want to kick off a discussion about deprecating grouped window functions
(GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
table function windowing is TUMBLE support is checked in. HOP and SESSION
support is likely to be merged in 1.23.0.

A briefly example of two different windowing syntax:

// Grouped window functions.
SELECT
   product_id, count(*), TUMBLE_START() as window_start
FROM order
GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
fixed window size.

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
GROUP BY product_id

I am giving a short, selective comparison as the following:

The places that table function windowing behaves better
1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
For example, one use case is for each hour, apply a JOIN on two streams. In
this case, no GROUP BY is needed.
2) grouped window functions allow multiple calls in GROUP BY. For example,
from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
is not wrong, but it is an illegal query.
3) Calcite includes an Enumerable implementation of table function
windowing, while grouped window functions do not have that.


The places that table function windowing behaves worse
1) table function windowing adds "window_start", "window_end" into table
directly, which increases the volume of data (number of rows *
sizeof(timestamp) * 2).


I want to focus on discussing two questions in this thread:
1) Do people support deprecating grouped window functions?
2) By which version people prefer to make grouped window functions
completely removed?(if 1) is yes).



[1]: https://jira.apache.org/jira/browse/CALCITE-3271


-Rui


Re: Stored Proc to Relational Expression

2020-04-22 Thread Stamatis Zampetakis
I assume that by GitHub page you mean [1].
For quite some time linq4j is part of calcite [2] so I think that [1] is
not maintained anymore.
The backlog in [1] also looks a bit obsolete.
Julian can correct me if I am wrong.

In my previous email, I was talking about relational expressions (i.e.,
RelNode [3]) and not Expression [4] and Statement [5].
Furthermore, when I mentioned translating expressions to SQL what I had in
mind is RelToSqlConverter [6].
Apologies for the confusion.

I believe LINQ was introduced to bring procedural code and SQL closer and
not use the one to replace the other.

Best,
Stamatis

[1] https://github.com/julianhyde/linq4j
[2] https://github.com/apache/calcite/tree/master/linq4j
[3]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/RelNode.java
[4]
https://github.com/apache/calcite/blob/master/linq4j/src/main/java/org/apache/calcite/linq4j/tree/Expression.java
[5]
https://github.com/apache/calcite/blob/master/linq4j/src/main/java/org/apache/calcite/linq4j/tree/Statement.java
[6]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

On Wed, Apr 22, 2020 at 8:54 AM Ravi Kapoor  wrote:

> Yeah Stamatis, I know that as I can see the github page for linq4j that
> lots of task is in Backlog and Link-To-SQL Provider is one of them
> But the intention of the project is to make java objects as Queryable talk
> to DB.
>
> Expressions and various Statement used in linq4j is to generate the
> different constructs like where filter, join to talk the back-end systems
> as Sql Queries.
>
> But what about the normal While loop and If then else statements which is
> not used in the Query-able objects .
> Example could be below:
>
>
> LOOP
>   SET heads = RAND() < 0.5;
>   IF heads THEN
> SELECT 'Heads!';
> SET heads_count = heads_count + 1;
>   ELSE
> SELECT 'Tails!';
> BREAK;
>   END IF;
> END LOOP;
> SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');
>
>
> How  linq4j going to convert normal while and If then else java expressions
> to sql query if they are not involved in computation of sql constructs like
> in lambda expression of filter.
>
> In one of the task in backlog I could see:
> *In the prototype LINQ-to-SQL provider, write a simple rule to recognize a
> select list and where clause and push them down to SQL.*
>
> I believe it was never intended for procedural code right?
>
> Thanks,
> Ravi
>
>
> On Wed, Apr 22, 2020 at 2:43 AM Stamatis Zampetakis 
> wrote:
>
> > Hi Ravi,
> >
> > As Julian already mentioned, we are quite far from what you would like to
> > achieve.
> >
> > Nevertheless, I would like to mention that you can model certain kind of
> > loops and recursion using the RepeatUnion [1] and Spool [2] relational
> > expressions. Note that these APIs are experimental.
> >
> > Moreover, if your end goal is to push the computation in a DBMS, I have
> to
> > warn you that the code to translate these expressions back to SQL is not
> > there yet.
> >
> > Best,
> > Stamatis
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/RepeatUnion.java
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Spool.java
> >
> > On Mon, Apr 20, 2020 at 8:08 PM Julian Hyde  wrote:
> >
> > > Calcite relational expressions can represent SELECT, INSERT etc. but
> not
> > > procedural code. It’s a direction we could consider going.
> > >
> > > RexProgram is the closest thing we currently have to procedural code in
> > > the algebra - single assignment of variables, use of variables in
> > > expressions assigning to other variables - but it is a long way short
> > > because there are no loops.
> > >
> > > > On Apr 19, 2020, at 12:47 PM, Ravi Kapoor 
> > > wrote:
> > > >
> > > > Hi Team,
> > > >
> > > > I have my use where I need to convert my dialect specific stored
> > > procedure
> > > > constructs like while loop, If then else to Rel expression
> > > >
> > > > Basically this can contain control flow statements like below
> > > >
> > > > DECLARE heads BOOL;
> > > > DECLARE heads_count INT64 DEFAULT 0;
> > > > LOOP
> > > >  SET heads = RAND() < 0.5;
> > > >  IF heads THEN
> > > >SELECT 'Heads!';
> > > >SET heads_count = heads_count + 1;
> > > >  ELSE
> > > >SELECT 'Tails!';
> > > >BREAK;
> > > >  END IF;
> > > > END LOOP;
> > > > SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');
> > > >
> > > >
> > > > I can create a Java AST model from the linq4j provided by calcite
> > however
> > > > this is only going to generate Java Result and I believe its only
> used
> > by
> > > > the calcite for relational expressions of enumerable calling
> convention
> > > > which is used by adapters which does not support core relational
> > > operations
> > > > right?
> > > >
> > > > Is there a way I can convert the stored proc constructs into some
> 

[jira] [Created] (CALCITE-3951) Support different comparison based on collation

2020-04-22 Thread Ruben Q L (Jira)
Ruben Q L created CALCITE-3951:
--

 Summary: Support different comparison based on collation
 Key: CALCITE-3951
 URL: https://issues.apache.org/jira/browse/CALCITE-3951
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Ruben Q L
Assignee: Ruben Q L


Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
etc. However, we cannot specify on a certain collation that e.g. a string field 
should use case insensitive comparison. The goal of this ticket is to evolve 
SqlCollation to support that, and adapt the corresponding classes to use that 
(optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Understanding annotations of SqlGroupingFunction

2020-04-22 Thread XING JIN
Filed a JIRA: https://issues.apache.org/jira/browse/CALCITE-3950

XING JIN  于2020年4月22日周三 下午2:51写道:

> Hi Vineet ~
> +1 on your analysis.
> Checking below case in agg.iq. We can see that the behavior of GROUPING
> function in Calcite is the same as Hive.
>
> # GROUPING in SELECT clause of CUBE query
> select deptno, job, count(*) as c, grouping(deptno) as d,
>   grouping(job) j, grouping(deptno, job) as x
> from "scott".emp
> group by cube(deptno, job);
> ++---++---+---+---+
> | DEPTNO | JOB   | C  | D | J | X |
> ++---++---+---+---+
> | 10 | CLERK |  1 | 0 | 0 | 0 |
> | 10 | MANAGER   |  1 | 0 | 0 | 0 |
> | 10 | PRESIDENT |  1 | 0 | 0 | 0 |
> | 10 |   |  3 | 0 | 1 | 1 |
> | 20 | ANALYST   |  2 | 0 | 0 | 0 |
> | 20 | CLERK |  2 | 0 | 0 | 0 |
> | 20 | MANAGER   |  1 | 0 | 0 | 0 |
> | 20 |   |  5 | 0 | 1 | 1 |
> | 30 | CLERK |  1 | 0 | 0 | 0 |
> | 30 | MANAGER   |  1 | 0 | 0 | 0 |
> | 30 | SALESMAN  |  4 | 0 | 0 | 0 |
> | 30 |   |  6 | 0 | 1 | 1 |
> || ANALYST   |  2 | 1 | 0 | 2 |
> || CLERK |  4 | 1 | 0 | 2 |
> || MANAGER   |  3 | 1 | 0 | 2 |
> || PRESIDENT |  1 | 1 | 0 | 2 |
> || SALESMAN  |  4 | 1 | 0 | 2 |
> ||   | 14 | 1 | 1 | 3 |
> ++---++---+---+---+
>
> IMHO, we might rectify the doc of SqlGroupingFunction as below:
>
> * 0 if both deptno and gender are being grouped,
> * 1 if only deptno is being grouped,
> * 2 if only gender is being groped,
> * 3 if neither deptno nor gender are being grouped.
>
>
> - Jin
>
>
> Vineet G  于2020年4月22日周三 上午5:18写道:
>
>> I expect that the user behavior for the GROUPING in both hive and calcite
>> is same. It’s just the documentation which is a bit confusing.
>> e.g.  comment line on grouping : if both deptno and gender are being
>> grouped
>>
>> should really mean that the row which represents the grand total i.e
>> without group by expression. will return 3.
>>
>> FYI Hive had its behavior fixed with
>> https://issues.apache.org/jira/browse/HIVE-16102 <
>> https://issues.apache.org/jira/browse/HIVE-16102> and currently uses
>> GROUPING ID function.
>>
>> Vineet Garg
>>
>>
>> > On Apr 21, 2020, at 10:42 AM, Julian Hyde 
>> wrote:
>> >
>> > Suppose we have one row that represents the total for department 10,
>> and another that represents the grand total of all departments. Which row
>> would we say that department is “grouped” (in Calcite’s parlance) or
>> “aggregated” in (Hive’s parlance)?
>> >
>> > I find the terms confusing. It’s possible that Calcite has them
>> “wrong”.
>> >
>> > I would simply run a query like
>> >
>> >  SELECT deptno, job, GROUPING(deptno, job), COUNT(*)
>> >  FROM emp
>> >  GROUP BY CUBE (deptno, job)
>> >
>> > and see whether Hive and Calcite return the same result.
>> >
>> >> On Apr 20, 2020, at 6:58 PM, ZZY  wrote:
>> >>
>> >> Hi, Hyde:
>> >> It's confused me that some annotations in
>> >> Calcite(org.apache.calcite.sql.fun.SqlGroupingFunction.java) :
>> >> /**
>> >> * The {@code GROUPING} function.
>> >> *
>> >> * Accepts 1 or more arguments.
>> >> * Example: {@code GROUPING(deptno, gender)} returns
>> >> * 3 if both deptno and gender are being grouped,
>> >> * 2 if only deptno is being grouped,
>> >> * 1 if only gender is being groped,
>> >> * 0 if neither deptno nor gender are being grouped.
>> >> *
>> >> * This function is defined in the SQL standard.
>> >> * {@code GROUPING_ID} is a non-standard synonym.
>> >> *
>> >> * Some examples are in {@code agg.iq}.
>> >> */
>> >>
>> >> The annotations above seems conflicts with other implementations like
>> Hive(
>> >>
>> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup?spm=ata.13261165.0.0.528c6dfcXalQFy#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction
>> >> )
>> >>
>> >> Notice that: "The grouping function indicates whether an expression in
>> a
>> >> GROUP BY clause is aggregated or not for a given row. The value 0
>> >> represents a column that is part of the grouping set, while the value 1
>> >> represents a column that is not part of the grouping set. "
>> >>
>> >>
>> >> It is clearly that 0 and 1 bit have different interpretation  between
>> >> annotations in Calcite and in Hive. And I did not figure out why...
>> >>
>> >> Any feedback can give me on this would be highly appreciated.
>> >>
>> >> Best regards!
>>
>>


[jira] [Created] (CALCITE-3950) Doc of SqlGroupingFunction contradicts with its behavior

2020-04-22 Thread Jin Xing (Jira)
Jin Xing created CALCITE-3950:
-

 Summary: Doc of SqlGroupingFunction contradicts with its behavior
 Key: CALCITE-3950
 URL: https://issues.apache.org/jira/browse/CALCITE-3950
 Project: Calcite
  Issue Type: Bug
Reporter: Jin Xing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)