[DISCUSS] Deprecate grouped window functions

2020-04-22 Thread Rui Wang
Hi community,

I want to kick off a discussion about deprecating grouped window functions
(GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
table function windowing is TUMBLE support is checked in. HOP and SESSION
support is likely to be merged in 1.23.0.

A briefly example of two different windowing syntax:

// Grouped window functions.
SELECT
   product_id, count(*), TUMBLE_START() as window_start
FROM order
GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
fixed window size.

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
GROUP BY product_id

I am giving a short, selective comparison as the following:

The places that table function windowing behaves better
1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
For example, one use case is for each hour, apply a JOIN on two streams. In
this case, no GROUP BY is needed.
2) grouped window functions allow multiple calls in GROUP BY. For example,
from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
is not wrong, but it is an illegal query.
3) Calcite includes an Enumerable implementation of table function
windowing, while grouped window functions do not have that.


The places that table function windowing behaves worse
1) table function windowing adds "window_start", "window_end" into table
directly, which increases the volume of data (number of rows *
sizeof(timestamp) * 2).


I want to focus on discussing two questions in this thread:
1) Do people support deprecating grouped window functions?
2) By which version people prefer to make grouped window functions
completely removed?(if 1) is yes).



[1]: https://jira.apache.org/jira/browse/CALCITE-3271


-Rui


Re: [DISCUSS] Deprecate grouped window functions

2020-04-22 Thread Rui Wang
Made a mistake on the example above, and update it as follows:

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
GROUP BY product_id, window_start

On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:

> Hi community,
>
> I want to kick off a discussion about deprecating grouped window functions
> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
> table function windowing is TUMBLE support is checked in. HOP and SESSION
> support is likely to be merged in 1.23.0.
>
> A briefly example of two different windowing syntax:
>
> // Grouped window functions.
> SELECT
>product_id, count(*), TUMBLE_START() as window_start
> FROM order
> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
> fixed window size.
>
> // Table function windowing syntax.
> SELECT
> product_id, count(*), window_start
> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> GROUP BY product_id
>
> I am giving a short, selective comparison as the following:
>
> The places that table function windowing behaves better
> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
> For example, one use case is for each hour, apply a JOIN on two streams. In
> this case, no GROUP BY is needed.
> 2) grouped window functions allow multiple calls in GROUP BY. For example,
> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
> is not wrong, but it is an illegal query.
> 3) Calcite includes an Enumerable implementation of table function
> windowing, while grouped window functions do not have that.
>
>
> The places that table function windowing behaves worse
> 1) table function windowing adds "window_start", "window_end" into table
> directly, which increases the volume of data (number of rows *
> sizeof(timestamp) * 2).
>
>
> I want to focus on discussing two questions in this thread:
> 1) Do people support deprecating grouped window functions?
> 2) By which version people prefer to make grouped window functions
> completely removed?(if 1) is yes).
>
>
>
> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
>
>
> -Rui
>


Re: [DISCUSS] Deprecate grouped window functions

2020-04-24 Thread Julian Hyde
+1

Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL change, 
not an API change, I don’t we need to give notice. Let’s just do it.

Julian

> On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
> 
> Made a mistake on the example above, and update it as follows:
> 
> // Table function windowing syntax.
> SELECT
>product_id, count(*), window_start
> FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
> GROUP BY product_id, window_start
> 
>> On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:
>> 
>> Hi community,
>> 
>> I want to kick off a discussion about deprecating grouped window functions
>> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
>> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
>> table function windowing is TUMBLE support is checked in. HOP and SESSION
>> support is likely to be merged in 1.23.0.
>> 
>> A briefly example of two different windowing syntax:
>> 
>> // Grouped window functions.
>> SELECT
>>   product_id, count(*), TUMBLE_START() as window_start
>> FROM order
>> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
>> fixed window size.
>> 
>> // Table function windowing syntax.
>> SELECT
>>product_id, count(*), window_start
>> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
>> GROUP BY product_id
>> 
>> I am giving a short, selective comparison as the following:
>> 
>> The places that table function windowing behaves better
>> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
>> For example, one use case is for each hour, apply a JOIN on two streams. In
>> this case, no GROUP BY is needed.
>> 2) grouped window functions allow multiple calls in GROUP BY. For example,
>> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
>> is not wrong, but it is an illegal query.
>> 3) Calcite includes an Enumerable implementation of table function
>> windowing, while grouped window functions do not have that.
>> 
>> 
>> The places that table function windowing behaves worse
>> 1) table function windowing adds "window_start", "window_end" into table
>> directly, which increases the volume of data (number of rows *
>> sizeof(timestamp) * 2).
>> 
>> 
>> I want to focus on discussing two questions in this thread:
>> 1) Do people support deprecating grouped window functions?
>> 2) By which version people prefer to make grouped window functions
>> completely removed?(if 1) is yes).
>> 
>> 
>> 
>> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
>> 
>> 
>> -Rui
>> 


Re: [DISCUSS] Deprecate grouped window functions

2020-04-24 Thread Timo Walther

Hi everyone,

so far Apache Flink depends on this feature. We are fine with improving 
the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION 
in the future. However, we would like to give our users some time to 
migrate their existing pipelines.


What does dropping mean for Calcite? Will users of Calcite be able to 
still support this syntax? In particular, are you intending to also drop 
concepts such as SqlGroupedWindowFunction and auxiliary group functions? 
Or are you intending to just remove entries from Calcite's default 
operator table?


Regards,
Timo


On 24.04.20 10:30, Julian Hyde wrote:

+1

Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL change, 
not an API change, I don’t we need to give notice. Let’s just do it.

Julian


On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:

Made a mistake on the example above, and update it as follows:

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
GROUP BY product_id, window_start


On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:

Hi community,

I want to kick off a discussion about deprecating grouped window functions
(GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current stage of
table function windowing is TUMBLE support is checked in. HOP and SESSION
support is likely to be merged in 1.23.0.

A briefly example of two different windowing syntax:

// Grouped window functions.
SELECT
   product_id, count(*), TUMBLE_START() as window_start
FROM order
GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour long
fixed window size.

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
GROUP BY product_id

I am giving a short, selective comparison as the following:

The places that table function windowing behaves better
1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming JOIN.
For example, one use case is for each hour, apply a JOIN on two streams. In
this case, no GROUP BY is needed.
2) grouped window functions allow multiple calls in GROUP BY. For example,
from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...), SESSION(...)
is not wrong, but it is an illegal query.
3) Calcite includes an Enumerable implementation of table function
windowing, while grouped window functions do not have that.


The places that table function windowing behaves worse
1) table function windowing adds "window_start", "window_end" into table
directly, which increases the volume of data (number of rows *
sizeof(timestamp) * 2).


I want to focus on discussing two questions in this thread:
1) Do people support deprecating grouped window functions?
2) By which version people prefer to make grouped window functions
completely removed?(if 1) is yes).



[1]: https://jira.apache.org/jira/browse/CALCITE-3271


-Rui





Re: [DISCUSS] Deprecate grouped window functions

2020-04-24 Thread Rui Wang
Hi Timo,

My intention is to fully drop concepts such as SqlGroupedWindowFunction and
auxiliary group functions, which include relevant code in parser/syntax,
operator, planner, etc.

Since you mentioned the need for more time to migrate. How many Calcite
releases that you think can probably leave enough buffer time? (Calcite
schedules 4 releases a year. So say 2 releases will give 6 months)


-Rui

On Fri, Apr 24, 2020 at 1:50 AM Timo Walther  wrote:

> Hi everyone,
>
> so far Apache Flink depends on this feature. We are fine with improving
> the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
> in the future. However, we would like to give our users some time to
> migrate their existing pipelines.
>
> What does dropping mean for Calcite? Will users of Calcite be able to
> still support this syntax? In particular, are you intending to also drop
> concepts such as SqlGroupedWindowFunction and auxiliary group functions?
> Or are you intending to just remove entries from Calcite's default
> operator table?
>
> Regards,
> Timo
>
>
> On 24.04.20 10:30, Julian Hyde wrote:
> > +1
> >
> > Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL
> change, not an API change, I don’t we need to give notice. Let’s just do it.
> >
> > Julian
> >
> >> On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
> >>
> >> Made a mistake on the example above, and update it as follows:
> >>
> >> // Table function windowing syntax.
> >> SELECT
> >> product_id, count(*), window_start
> >> FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
> >> GROUP BY product_id, window_start
> >>
> >>> On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:
> >>>
> >>> Hi community,
> >>>
> >>> I want to kick off a discussion about deprecating grouped window
> functions
> >>> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> >>> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current
> stage of
> >>> table function windowing is TUMBLE support is checked in. HOP and
> SESSION
> >>> support is likely to be merged in 1.23.0.
> >>>
> >>> A briefly example of two different windowing syntax:
> >>>
> >>> // Grouped window functions.
> >>> SELECT
> >>>product_id, count(*), TUMBLE_START() as window_start
> >>> FROM order
> >>> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour
> long
> >>> fixed window size.
> >>>
> >>> // Table function windowing syntax.
> >>> SELECT
> >>> product_id, count(*), window_start
> >>> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> >>> GROUP BY product_id
> >>>
> >>> I am giving a short, selective comparison as the following:
> >>>
> >>> The places that table function windowing behaves better
> >>> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming
> JOIN.
> >>> For example, one use case is for each hour, apply a JOIN on two
> streams. In
> >>> this case, no GROUP BY is needed.
> >>> 2) grouped window functions allow multiple calls in GROUP BY. For
> example,
> >>> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),
> SESSION(...)
> >>> is not wrong, but it is an illegal query.
> >>> 3) Calcite includes an Enumerable implementation of table function
> >>> windowing, while grouped window functions do not have that.
> >>>
> >>>
> >>> The places that table function windowing behaves worse
> >>> 1) table function windowing adds "window_start", "window_end" into
> table
> >>> directly, which increases the volume of data (number of rows *
> >>> sizeof(timestamp) * 2).
> >>>
> >>>
> >>> I want to focus on discussing two questions in this thread:
> >>> 1) Do people support deprecating grouped window functions?
> >>> 2) By which version people prefer to make grouped window functions
> >>> completely removed?(if 1) is yes).
> >>>
> >>>
> >>>
> >>> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
> >>>
> >>>
> >>> -Rui
> >>>
>
>


Re: [DISCUSS] Deprecate grouped window functions

2020-04-27 Thread Julian Hyde
Changing my +1 to +0. We have to make reasonable accommodations for our users. 
Glad we had this discussion.

> On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:
> 
> Hi Timo,
> 
> My intention is to fully drop concepts such as SqlGroupedWindowFunction and
> auxiliary group functions, which include relevant code in parser/syntax,
> operator, planner, etc.
> 
> Since you mentioned the need for more time to migrate. How many Calcite
> releases that you think can probably leave enough buffer time? (Calcite
> schedules 4 releases a year. So say 2 releases will give 6 months)
> 
> 
> -Rui
> 
> On Fri, Apr 24, 2020 at 1:50 AM Timo Walther  wrote:
> 
>> Hi everyone,
>> 
>> so far Apache Flink depends on this feature. We are fine with improving
>> the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
>> in the future. However, we would like to give our users some time to
>> migrate their existing pipelines.
>> 
>> What does dropping mean for Calcite? Will users of Calcite be able to
>> still support this syntax? In particular, are you intending to also drop
>> concepts such as SqlGroupedWindowFunction and auxiliary group functions?
>> Or are you intending to just remove entries from Calcite's default
>> operator table?
>> 
>> Regards,
>> Timo
>> 
>> 
>> On 24.04.20 10:30, Julian Hyde wrote:
>>> +1
>>> 
>>> Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL
>> change, not an API change, I don’t we need to give notice. Let’s just do it.
>>> 
>>> Julian
>>> 
 On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
 
 Made a mistake on the example above, and update it as follows:
 
 // Table function windowing syntax.
 SELECT
product_id, count(*), window_start
 FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
 GROUP BY product_id, window_start
 
> On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:
> 
> Hi community,
> 
> I want to kick off a discussion about deprecating grouped window
>> functions
> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current
>> stage of
> table function windowing is TUMBLE support is checked in. HOP and
>> SESSION
> support is likely to be merged in 1.23.0.
> 
> A briefly example of two different windowing syntax:
> 
> // Grouped window functions.
> SELECT
>   product_id, count(*), TUMBLE_START() as window_start
> FROM order
> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour
>> long
> fixed window size.
> 
> // Table function windowing syntax.
> SELECT
>product_id, count(*), window_start
> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> GROUP BY product_id
> 
> I am giving a short, selective comparison as the following:
> 
> The places that table function windowing behaves better
> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming
>> JOIN.
> For example, one use case is for each hour, apply a JOIN on two
>> streams. In
> this case, no GROUP BY is needed.
> 2) grouped window functions allow multiple calls in GROUP BY. For
>> example,
> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),
>> SESSION(...)
> is not wrong, but it is an illegal query.
> 3) Calcite includes an Enumerable implementation of table function
> windowing, while grouped window functions do not have that.
> 
> 
> The places that table function windowing behaves worse
> 1) table function windowing adds "window_start", "window_end" into
>> table
> directly, which increases the volume of data (number of rows *
> sizeof(timestamp) * 2).
> 
> 
> I want to focus on discussing two questions in this thread:
> 1) Do people support deprecating grouped window functions?
> 2) By which version people prefer to make grouped window functions
> completely removed?(if 1) is yes).
> 
> 
> 
> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
> 
> 
> -Rui
> 
>> 
>> 



Re: [DISCUSS] Deprecate grouped window functions

2020-04-28 Thread Rui Wang
Agreed. I would like to get more feedback to have a
reasonable accommodation for users.


-Rui

On Mon, Apr 27, 2020 at 11:50 AM Julian Hyde  wrote:

> Changing my +1 to +0. We have to make reasonable accommodations for our
> users. Glad we had this discussion.
>
> > On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:
> >
> > Hi Timo,
> >
> > My intention is to fully drop concepts such as SqlGroupedWindowFunction
> and
> > auxiliary group functions, which include relevant code in parser/syntax,
> > operator, planner, etc.
> >
> > Since you mentioned the need for more time to migrate. How many Calcite
> > releases that you think can probably leave enough buffer time? (Calcite
> > schedules 4 releases a year. So say 2 releases will give 6 months)
> >
> >
> > -Rui
> >
> > On Fri, Apr 24, 2020 at 1:50 AM Timo Walther  wrote:
> >
> >> Hi everyone,
> >>
> >> so far Apache Flink depends on this feature. We are fine with improving
> >> the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
> >> in the future. However, we would like to give our users some time to
> >> migrate their existing pipelines.
> >>
> >> What does dropping mean for Calcite? Will users of Calcite be able to
> >> still support this syntax? In particular, are you intending to also drop
> >> concepts such as SqlGroupedWindowFunction and auxiliary group functions?
> >> Or are you intending to just remove entries from Calcite's default
> >> operator table?
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 24.04.20 10:30, Julian Hyde wrote:
> >>> +1
> >>>
> >>> Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL
> >> change, not an API change, I don’t we need to give notice. Let’s just
> do it.
> >>>
> >>> Julian
> >>>
>  On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
> 
>  Made a mistake on the example above, and update it as follows:
> 
>  // Table function windowing syntax.
>  SELECT
> product_id, count(*), window_start
>  FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
>  GROUP BY product_id, window_start
> 
> > On Wed, Apr 22, 2020 at 2:31 PM Rui Wang 
> wrote:
> >
> > Hi community,
> >
> > I want to kick off a discussion about deprecating grouped window
> >> functions
> > (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> > becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current
> >> stage of
> > table function windowing is TUMBLE support is checked in. HOP and
> >> SESSION
> > support is likely to be merged in 1.23.0.
> >
> > A briefly example of two different windowing syntax:
> >
> > // Grouped window functions.
> > SELECT
> >   product_id, count(*), TUMBLE_START() as window_start
> > FROM order
> > GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour
> >> long
> > fixed window size.
> >
> > // Table function windowing syntax.
> > SELECT
> >product_id, count(*), window_start
> > FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> > GROUP BY product_id
> >
> > I am giving a short, selective comparison as the following:
> >
> > The places that table function windowing behaves better
> > 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming
> >> JOIN.
> > For example, one use case is for each hour, apply a JOIN on two
> >> streams. In
> > this case, no GROUP BY is needed.
> > 2) grouped window functions allow multiple calls in GROUP BY. For
> >> example,
> > from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),
> >> SESSION(...)
> > is not wrong, but it is an illegal query.
> > 3) Calcite includes an Enumerable implementation of table function
> > windowing, while grouped window functions do not have that.
> >
> >
> > The places that table function windowing behaves worse
> > 1) table function windowing adds "window_start", "window_end" into
> >> table
> > directly, which increases the volume of data (number of rows *
> > sizeof(timestamp) * 2).
> >
> >
> > I want to focus on discussing two questions in this thread:
> > 1) Do people support deprecating grouped window functions?
> > 2) By which version people prefer to make grouped window functions
> > completely removed?(if 1) is yes).
> >
> >
> >
> > [1]: https://jira.apache.org/jira/browse/CALCITE-3271
> >
> >
> > -Rui
> >
> >>
> >>
>
>


Re: [DISCUSS] Deprecate grouped window functions

2020-04-30 Thread Timo Walther

Thanks for considering our needs.

I'm pretty sure that windows are in almost every streaming pipeline with 
aggregations. Unlike regular Java API, SQL syntax is very difficult to 
deprecate.


We usually give Flink user 1-2 releases time to update their code. Once 
Calcite supports polymorphic table functions, I think 6 months would be 
helpful otherwise we need to maintain our own fork which we could mostly 
prevent so far.


Regards,
Timo

On 29.04.20 00:49, Rui Wang wrote:

Agreed. I would like to get more feedback to have a
reasonable accommodation for users.


-Rui

On Mon, Apr 27, 2020 at 11:50 AM Julian Hyde  wrote:


Changing my +1 to +0. We have to make reasonable accommodations for our
users. Glad we had this discussion.


On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:

Hi Timo,

My intention is to fully drop concepts such as SqlGroupedWindowFunction

and

auxiliary group functions, which include relevant code in parser/syntax,
operator, planner, etc.

Since you mentioned the need for more time to migrate. How many Calcite
releases that you think can probably leave enough buffer time? (Calcite
schedules 4 releases a year. So say 2 releases will give 6 months)


-Rui

On Fri, Apr 24, 2020 at 1:50 AM Timo Walther  wrote:


Hi everyone,

so far Apache Flink depends on this feature. We are fine with improving
the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
in the future. However, we would like to give our users some time to
migrate their existing pipelines.

What does dropping mean for Calcite? Will users of Calcite be able to
still support this syntax? In particular, are you intending to also drop
concepts such as SqlGroupedWindowFunction and auxiliary group functions?
Or are you intending to just remove entries from Calcite's default
operator table?

Regards,
Timo


On 24.04.20 10:30, Julian Hyde wrote:

+1

Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL

change, not an API change, I don’t we need to give notice. Let’s just

do it.


Julian


On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:

Made a mistake on the example above, and update it as follows:

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
GROUP BY product_id, window_start


On Wed, Apr 22, 2020 at 2:31 PM Rui Wang 

wrote:


Hi community,

I want to kick off a discussion about deprecating grouped window

functions

(GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current

stage of

table function windowing is TUMBLE support is checked in. HOP and

SESSION

support is likely to be merged in 1.23.0.

A briefly example of two different windowing syntax:

// Grouped window functions.
SELECT
   product_id, count(*), TUMBLE_START() as window_start
FROM order
GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour

long

fixed window size.

// Table function windowing syntax.
SELECT
product_id, count(*), window_start
FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
GROUP BY product_id

I am giving a short, selective comparison as the following:

The places that table function windowing behaves better
1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming

JOIN.

For example, one use case is for each hour, apply a JOIN on two

streams. In

this case, no GROUP BY is needed.
2) grouped window functions allow multiple calls in GROUP BY. For

example,

from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),

SESSION(...)

is not wrong, but it is an illegal query.
3) Calcite includes an Enumerable implementation of table function
windowing, while grouped window functions do not have that.


The places that table function windowing behaves worse
1) table function windowing adds "window_start", "window_end" into

table

directly, which increases the volume of data (number of rows *
sizeof(timestamp) * 2).


I want to focus on discussing two questions in this thread:
1) Do people support deprecating grouped window functions?
2) By which version people prefer to make grouped window functions
completely removed?(if 1) is yes).



[1]: https://jira.apache.org/jira/browse/CALCITE-3271


-Rui













Re: [DISCUSS] Deprecate grouped window functions

2020-04-30 Thread Viliam Durina
What is the status of polymorphic table functions? We'd like to use them.

Viliam


On Thu, 30 Apr 2020 at 16:16, Timo Walther  wrote:

> Thanks for considering our needs.
>
> I'm pretty sure that windows are in almost every streaming pipeline with
> aggregations. Unlike regular Java API, SQL syntax is very difficult to
> deprecate.
>
> We usually give Flink user 1-2 releases time to update their code. Once
> Calcite supports polymorphic table functions, I think 6 months would be
> helpful otherwise we need to maintain our own fork which we could mostly
> prevent so far.
>
> Regards,
> Timo
>
> On 29.04.20 00:49, Rui Wang wrote:
> > Agreed. I would like to get more feedback to have a
> > reasonable accommodation for users.
> >
> >
> > -Rui
> >
> > On Mon, Apr 27, 2020 at 11:50 AM Julian Hyde  wrote:
> >
> >> Changing my +1 to +0. We have to make reasonable accommodations for our
> >> users. Glad we had this discussion.
> >>
> >>> On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:
> >>>
> >>> Hi Timo,
> >>>
> >>> My intention is to fully drop concepts such as SqlGroupedWindowFunction
> >> and
> >>> auxiliary group functions, which include relevant code in
> parser/syntax,
> >>> operator, planner, etc.
> >>>
> >>> Since you mentioned the need for more time to migrate. How many Calcite
> >>> releases that you think can probably leave enough buffer time? (Calcite
> >>> schedules 4 releases a year. So say 2 releases will give 6 months)
> >>>
> >>>
> >>> -Rui
> >>>
> >>> On Fri, Apr 24, 2020 at 1:50 AM Timo Walther 
> wrote:
> >>>
>  Hi everyone,
> 
>  so far Apache Flink depends on this feature. We are fine with
> improving
>  the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
>  in the future. However, we would like to give our users some time to
>  migrate their existing pipelines.
> 
>  What does dropping mean for Calcite? Will users of Calcite be able to
>  still support this syntax? In particular, are you intending to also
> drop
>  concepts such as SqlGroupedWindowFunction and auxiliary group
> functions?
>  Or are you intending to just remove entries from Calcite's default
>  operator table?
> 
>  Regards,
>  Timo
> 
> 
>  On 24.04.20 10:30, Julian Hyde wrote:
> > +1
> >
> > Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL
>  change, not an API change, I don’t we need to give notice. Let’s just
> >> do it.
> >
> > Julian
> >
> >> On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
> >>
> >> Made a mistake on the example above, and update it as follows:
> >>
> >> // Table function windowing syntax.
> >> SELECT
> >> product_id, count(*), window_start
> >> FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
> >> GROUP BY product_id, window_start
> >>
> >>> On Wed, Apr 22, 2020 at 2:31 PM Rui Wang 
> >> wrote:
> >>>
> >>> Hi community,
> >>>
> >>> I want to kick off a discussion about deprecating grouped window
>  functions
> >>> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing
> support
> >>> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current
>  stage of
> >>> table function windowing is TUMBLE support is checked in. HOP and
>  SESSION
> >>> support is likely to be merged in 1.23.0.
> >>>
> >>> A briefly example of two different windowing syntax:
> >>>
> >>> // Grouped window functions.
> >>> SELECT
> >>>product_id, count(*), TUMBLE_START() as window_start
> >>> FROM order
> >>> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour
>  long
> >>> fixed window size.
> >>>
> >>> // Table function windowing syntax.
> >>> SELECT
> >>> product_id, count(*), window_start
> >>> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> >>> GROUP BY product_id
> >>>
> >>> I am giving a short, selective comparison as the following:
> >>>
> >>> The places that table function windowing behaves better
> >>> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming
>  JOIN.
> >>> For example, one use case is for each hour, apply a JOIN on two
>  streams. In
> >>> this case, no GROUP BY is needed.
> >>> 2) grouped window functions allow multiple calls in GROUP BY. For
>  example,
> >>> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),
>  SESSION(...)
> >>> is not wrong, but it is an illegal query.
> >>> 3) Calcite includes an Enumerable implementation of table function
> >>> windowing, while grouped window functions do not have that.
> >>>
> >>>
> >>> The places that table function windowing behaves worse
> >>> 1) table function windowing adds "window_start", "window_end" into
>  table
> >>> directly, which increases the volume of data (number of ro

Re: [DISCUSS] Deprecate grouped window functions

2020-04-30 Thread Julian Hyde



> On Apr 30, 2020, at 8:16 AM, Viliam Durina  wrote:
> 
> What is the status of polymorphic table functions? We'd like to use them.

Off topic. Can you start this discussion in a new thread?

Julian



Re: [DISCUSS] Deprecate grouped window functions

2020-04-30 Thread Julian Hyde
I understand that you need to continue to support the SQL syntax while your 
users still want it. I suggest that Calcite continues to support the SQL syntax 
in the parser (and the SqlNode AST) but deprecates and removes the support in 
the algebra (RelNode) within one or two releases (3 - 6 months).

I’m not familiar with a requirement for polymorphic table functions. Is there a 
JIRA case logged? Is it possible to do this feature without them?

Julian


> On Apr 30, 2020, at 7:16 AM, Timo Walther  wrote:
> 
> Thanks for considering our needs.
> 
> I'm pretty sure that windows are in almost every streaming pipeline with 
> aggregations. Unlike regular Java API, SQL syntax is very difficult to 
> deprecate.
> 
> We usually give Flink user 1-2 releases time to update their code. Once 
> Calcite supports polymorphic table functions, I think 6 months would be 
> helpful otherwise we need to maintain our own fork which we could mostly 
> prevent so far.
> 
> Regards,
> Timo
> 
> On 29.04.20 00:49, Rui Wang wrote:
>> Agreed. I would like to get more feedback to have a
>> reasonable accommodation for users.
>> -Rui
>> On Mon, Apr 27, 2020 at 11:50 AM Julian Hyde  wrote:
>>> Changing my +1 to +0. We have to make reasonable accommodations for our
>>> users. Glad we had this discussion.
>>> 
 On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:
 
 Hi Timo,
 
 My intention is to fully drop concepts such as SqlGroupedWindowFunction
>>> and
 auxiliary group functions, which include relevant code in parser/syntax,
 operator, planner, etc.
 
 Since you mentioned the need for more time to migrate. How many Calcite
 releases that you think can probably leave enough buffer time? (Calcite
 schedules 4 releases a year. So say 2 releases will give 6 months)
 
 
 -Rui
 
 On Fri, Apr 24, 2020 at 1:50 AM Timo Walther  wrote:
 
> Hi everyone,
> 
> so far Apache Flink depends on this feature. We are fine with improving
> the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
> in the future. However, we would like to give our users some time to
> migrate their existing pipelines.
> 
> What does dropping mean for Calcite? Will users of Calcite be able to
> still support this syntax? In particular, are you intending to also drop
> concepts such as SqlGroupedWindowFunction and auxiliary group functions?
> Or are you intending to just remove entries from Calcite's default
> operator table?
> 
> Regards,
> Timo
> 
> 
> On 24.04.20 10:30, Julian Hyde wrote:
>> +1
>> 
>> Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL
> change, not an API change, I don’t we need to give notice. Let’s just
>>> do it.
>> 
>> Julian
>> 
>>> On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
>>> 
>>> Made a mistake on the example above, and update it as follows:
>>> 
>>> // Table function windowing syntax.
>>> SELECT
>>>product_id, count(*), window_start
>>> FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
>>> GROUP BY product_id, window_start
>>> 
 On Wed, Apr 22, 2020 at 2:31 PM Rui Wang 
>>> wrote:
 
 Hi community,
 
 I want to kick off a discussion about deprecating grouped window
> functions
 (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
 becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current
> stage of
 table function windowing is TUMBLE support is checked in. HOP and
> SESSION
 support is likely to be merged in 1.23.0.
 
 A briefly example of two different windowing syntax:
 
 // Grouped window functions.
 SELECT
   product_id, count(*), TUMBLE_START() as window_start
 FROM order
 GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour
> long
 fixed window size.
 
 // Table function windowing syntax.
 SELECT
product_id, count(*), window_start
 FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
 GROUP BY product_id
 
 I am giving a short, selective comparison as the following:
 
 The places that table function windowing behaves better
 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming
> JOIN.
 For example, one use case is for each hour, apply a JOIN on two
> streams. In
 this case, no GROUP BY is needed.
 2) grouped window functions allow multiple calls in GROUP BY. For
> example,
 from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),
> SESSION(...)
 is not wrong, but it is an illegal query.
 3) Calcite includes an Enumerable implementation of table function
 windowing, while grouped w

Re: [DISCUSS] Deprecate grouped window functions

2020-04-30 Thread Rui Wang
Polymorphic table function is logged at [1]. I assigned that to myself
because I started to implement DESCRIPTOR (a part of PTF in SQL standard).
It's very welcomed if anyone wants to help to accelerate the
implementation. (But please use another thread to discuss it).


Back to this thread's topic.

Timo:
>Once Calcite supports polymorphic table functions
I am guessing you actually talked about TUMBLE/HOP/SESSION work as PTF in
Calcite? The PTF itself has much more features beyond what is need for
table function windowing.
Regarding the TUMBLE/HOP/SESSION as PTF in Calcite, I believe basic but the
most important functionality will very likely be in 1.23.0 release. There
will be a few small patches added after that (in 1.24.0).

Julian:
>I suggest that Calcite continues to support the SQL syntax in the parser
(and the SqlNode AST) but deprecates and removes the support in the algebra
(RelNode) within one or two releases (3 - 6 months).

+1 on this compromise. We could leave the syntax support there for longer
time to make sure user query running (leave the possibility for downstream
to have a translation if they cannot migrate off the syntax within near or
mid term). Deprecation begins from algebra can happen faster (in 1.25.0 or
1.26.0).


[1]: https://jira.apache.org/jira/browse/CALCITE-2270


-Rui

On Thu, Apr 30, 2020 at 10:38 AM Julian Hyde  wrote:

> I understand that you need to continue to support the SQL syntax while
> your users still want it. I suggest that Calcite continues to support the
> SQL syntax in the parser (and the SqlNode AST) but deprecates and removes
> the support in the algebra (RelNode) within one or two releases (3 - 6
> months).
>
> I’m not familiar with a requirement for polymorphic table functions. Is
> there a JIRA case logged? Is it possible to do this feature without them?
>
> Julian
>
>
> > On Apr 30, 2020, at 7:16 AM, Timo Walther  wrote:
> >
> > Thanks for considering our needs.
> >
> > I'm pretty sure that windows are in almost every streaming pipeline with
> aggregations. Unlike regular Java API, SQL syntax is very difficult to
> deprecate.
> >
> > We usually give Flink user 1-2 releases time to update their code. Once
> Calcite supports polymorphic table functions, I think 6 months would be
> helpful otherwise we need to maintain our own fork which we could mostly
> prevent so far.
> >
> > Regards,
> > Timo
> >
> > On 29.04.20 00:49, Rui Wang wrote:
> >> Agreed. I would like to get more feedback to have a
> >> reasonable accommodation for users.
> >> -Rui
> >> On Mon, Apr 27, 2020 at 11:50 AM Julian Hyde  wrote:
> >>> Changing my +1 to +0. We have to make reasonable accommodations for our
> >>> users. Glad we had this discussion.
> >>>
>  On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:
> 
>  Hi Timo,
> 
>  My intention is to fully drop concepts such as
> SqlGroupedWindowFunction
> >>> and
>  auxiliary group functions, which include relevant code in
> parser/syntax,
>  operator, planner, etc.
> 
>  Since you mentioned the need for more time to migrate. How many
> Calcite
>  releases that you think can probably leave enough buffer time?
> (Calcite
>  schedules 4 releases a year. So say 2 releases will give 6 months)
> 
> 
>  -Rui
> 
>  On Fri, Apr 24, 2020 at 1:50 AM Timo Walther 
> wrote:
> 
> > Hi everyone,
> >
> > so far Apache Flink depends on this feature. We are fine with
> improving
> > the SQL compliance and eventually dropping GROUP BY
> TUMBLE/HOP/SESSION
> > in the future. However, we would like to give our users some time to
> > migrate their existing pipelines.
> >
> > What does dropping mean for Calcite? Will users of Calcite be able to
> > still support this syntax? In particular, are you intending to also
> drop
> > concepts such as SqlGroupedWindowFunction and auxiliary group
> functions?
> > Or are you intending to just remove entries from Calcite's default
> > operator table?
> >
> > Regards,
> > Timo
> >
> >
> > On 24.04.20 10:30, Julian Hyde wrote:
> >> +1
> >>
> >> Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a
> SQL
> > change, not an API change, I don’t we need to give notice. Let’s just
> >>> do it.
> >>
> >> Julian
> >>
> >>> On Apr 22, 2020, at 4:05 PM, Rui Wang 
> wrote:
> >>>
> >>> Made a mistake on the example above, and update it as follows:
> >>>
> >>> // Table function windowing syntax.
> >>> SELECT
> >>>product_id, count(*), window_start
> >>> FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
> >>> GROUP BY product_id, window_start
> >>>
>  On Wed, Apr 22, 2020 at 2:31 PM Rui Wang 
> >>> wrote:
> 
>  Hi community,
> 
>  I want to kick off a discussion about deprecating grouped window
> > functions
>  (GROUP BY TUMBLE/HOP/SESSION) as the t