Hi Jark,

2 & 3 sounds good to me.

Regarding time attribute,
I still have some questions, I knew it's easy to support cascaded window
aggregate using new TVFs.
However there are some other places where need time attribute:
- CEP
- interval join
- order by
- over window
If there is no time attribute column, how do we integrate these old
features with the new TVFs.
E.g.
StreamA -> new window aggregate -> interval join -> Sink
                                                         /
StreamB -----------------------------------


Jark Wu <imj...@gmail.com> 于2020年10月9日周五 下午11:51写道:

> Hi Benchao,
>
> 1) time attribute
> Yes. We don't need time attribute auxiliary function. Because the new
> window operations are all based on the
>  window_start and window_end columns instead of on the time attributes. So
> we don't need to propagate time attributes.
> Cascaded window aggregate can be expressed by simply GROUP BY the
> window_start and window_end of the previous window result.
> I have added a cascaded window aggregate example in the Tumbling Window
> section in the FLIP.
> If you want to define proctime window aggregate, the time column in TVF
> should be a proctime attribute field (or PROCTIME() function).
>
> 2) batch support
> Yes. The proposed syntax/API are unified for batch and streaming. Batch
> support is in the plan, but may not have enough time to catch up 1.12.
>
> 3) support `grouping sets`
> This is not included in the FLIP, but I think it's great if we can support
> `grouping sets`.
> The existing window impl doesn't support this because we convert the
> LogicalAggregate into WindowAggregate in the beginning,
> the expand grouping sets rule can't be applied in this situation.
> Fortunately, with the new window impl, the conversion to WindowAggregate
> will happen at the end, so I think the expand rule can be
>  applied and support this feature naturally.
> Therefore, IMO, we don't need to include this feature in this FLIP to avoid
> the FLIP being too large.
> This can be a follow-up issue (maybe just add tests and docs) after the
> FLIP.
>
> Best,
> Jark
>
>
> On Fri, 9 Oct 2020 at 19:09, 刘 芃成 <pengchengliucr...@gmail.com> wrote:
>
> > Hi,Benchao,
> >         Welcome to join the discussion, yes, this new syntax can make SQL
> > more clear and simpler.
> >         For your first question, the `window_start` and `window_end`
> > columns will be added automatically,
> >         so we don't need to use auxiliary group functions to infer or
> > access the window properties.
> >
> >         For the `grouping sets` on TVFs, I think it's interesting if we
> > can support it, as we already supported `grouping sets`
> >         on streaming aggregates in blink planner. But I'm not sure if it
> > will be included into this FLIP.
> >
> >         cc @Jark Wu
> >
> > Best,
> > Pengcheng
> >
> >
> > 在 2020/10/9 下午5:25,“Benchao Li”<libenc...@apache.org> 写入:
> >
> >     Thanks Jark for bringing this discussion, I like this FLIP very much.
> >
> >     Especially the cumulate window, it's much like the current TUMBLE
> > window +
> >     Fast Emit (which is an undocumented experimental feature), however,
> > it's
> >     more powerful.
> >
> >     And This will make the SQL semantic more standard, especially for the
> >     HOPPING window.
> >
> >     Regarding time attribute,
> >     It seems that we don't need a specific function to infer the time
> > attribute
> >     like
> >     `TUMBLE_ROWTIME` / `TUMBLE_PROCTIME`. Then are `window_start` and
> >     `window_end`
> >     column a time attribute column automatically?
> >     - If not, what will be the time attribute of the result relation of
> > these
> >     TVFs?
> >       Especially after the window aggregation.
> >     - If yes, then how do we handle proctime?
> >
> >     Regarding batch operators,
> >     It's great to hear that we can reuse the batch operators in
> continuous
> >     batch mode
> >     as you mentioned in the FLIP.
> >     Current window aggregate could also be used in batch mode with
> > rowtime. Do
> >     you plan
> >     to support these TVFs for batch mode in this FLIP? Hence the
> Table/SQL
> > is a
> >     unified
> >     API, it's great if we can keep the features complete both in
> streaming
> > and
> >     batch mode.
> >
> >     There is one more question, I don't know whether it should be
> > considered in
> >     this FLIP.
> >     Does the new window support `grouping sets`? (It's not supported in
> old
> >     window impl).
> >
> >     Jark Wu <imj...@gmail.com> 于2020年10月9日周五 下午4:14写道:
> >
> >     > Hi all,
> >     >
> >     > I know we have a lot of discussion and development on going right
> > now but
> >     > it would be great if we can get FLIP-145 into a votable state.
> >     > If there are no objections, I would like to start voting in the
> next
> > days.
> >     >
> >     > Best,
> >     > Jark
> >     >
> >     > On Thu, 1 Oct 2020 at 14:29, Jark Wu <imj...@gmail.com> wrote:
> >     >
> >     > > Hi everyone,
> >     > >
> >     > > I have added a section for Performance Optimization to describe
> > how to
> >     > > improve the performance in the short-term and long-term
> >     > > and sketch the future performance potential under the new window
> > API.
> >     > > Introducing the window API is just the first step, we will
> >     > > continuously improve the performance to make it powerful and
> > useful.
> >     > >
> >     > > Best,
> >     > > Jark
> >     > >
> >     > > On Thu, 1 Oct 2020 at 14:28, Jark Wu <imj...@gmail.com> wrote:
> >     > >
> >     > >> Hi Pengcheng,
> >     > >>
> >     > >> Yes, the window TVF is part of the FLIP. Welcome to contribute
> > and join
> >     > >> the discussion.
> >     > >> Regarding the SESSION window aggregation, users can use the
> > existing
> >     > >> grouped session window function.
> >     > >>
> >     > >> Best,
> >     > >> Jark
> >     > >>
> >     > >> On Sun, 27 Sep 2020 at 21:24, liupengcheng <
> > pengchengliucr...@gmail.com
> >     > >
> >     > >> wrote:
> >     > >>
> >     > >>> Hi Jark,
> >     > >>>         Thanks for reply, yes, I think it's a good feature, it
> > can
> >     > >>> improve the NRT scenarios
> >     > >>>         as you mentioned in the FLIP. Also, I think it can
> > improve the
> >     > >>> streaming SQL greatly,
> >     > >>>         it can support richer window operations in flink SQL
> and
> > bring
> >     > >>> great convenience to users.
> >     > >>>         (we are now only supported group window in flink).
> >     > >>>
> >     > >>>         Regarding the SESSION window, I think it's especially
> > useful
> >     > for
> >     > >>> user behavior analysis(e.g.
> >     > >>>         counting user visits on a news website or social
> > platform), but
> >     > >>> I agree that we can keep it
> >     > >>>         out of the FLIP now to catch up 1.12.
> >     > >>>
> >     > >>>         Recently, I've done some work on the stream planner
> with
> > the
> >     > >>> TVFs, and I'm willing to contribute
> >     > >>>         to this part. Is it in the plan of this FLIP?
> >     > >>>
> >     > >>>         Best,
> >     > >>>         PengchengLiu
> >     > >>>
> >     > >>>
> >     > >>> 在 2020/9/26 下午11:09,“Jark Wu”<imj...@gmail.com> 写入:
> >     > >>>
> >     > >>>     Hi pengcheng,
> >     > >>>
> >     > >>>     That's great to see you also have the need of window join.
> >     > >>>     You are right, the windowing TVF is a powerful feature
> which
> > can
> >     > >>> support
> >     > >>>     more operations in the future.
> >     > >>>     I think it as of the date time "partition" selection in
> > batch SQL
> >     > >>> jobs,
> >     > >>>     with this new syntax, I think it is possible
> >     > >>>      to migrate traditional batch SQL jobs to Flink SQL by
> > changing a
> >     > >>> few lines.
> >     > >>>
> >     > >>>     Regarding the SESSION window, this is on purpose to keep it
> > out of
> >     > >>> the
> >     > >>>     FLIP, because we want to keep the
> >     > >>>     FLIP small to catch up 1.12 and SESSION TVF is rarely
> useful
> > (e.g.
> >     > >>> session
> >     > >>>     window join?).
> >     > >>>
> >     > >>>     Best,
> >     > >>>     Jark
> >     > >>>
> >     > >>>     On Fri, 25 Sep 2020 at 22:59, liupengcheng <
> >     > >>> pengchengliucr...@gmail.com>
> >     > >>>     wrote:
> >     > >>>
> >     > >>>     > Hi, Jark,
> >     > >>>     >         I'm very interested in this feature, and I'm also
> > working
> >     > >>> on this
> >     > >>>     > recently.
> >     > >>>     >         I just have a glance at the FLIP, it's good, but
> I
> > found
> >     > >>> that
> >     > >>>     > there is no plan to add SESSION windows.
> >     > >>>     >         Also, I think there can be more things we can do
> > based on
> >     > >>> this new
> >     > >>>     > syntax. For example,
> >     > >>>     >         - window sort support
> >     > >>>     >         - window union/intersect/minus support
> >     > >>>     >         - Improve dimension table join
> >     > >>>     >         We can have more deep discussion on this new
> > feature
> >     > later
> >     > >>> .
> >     > >>>     >         I've also opened an jira that is related to this
> > feature
> >     > >>> recently:
> >     > >>>     > https://issues.apache.org/jira/browse/FLINK-18830
> >     > >>>     >
> >     > >>>     > Best!
> >     > >>>     > PengchengLiu
> >     > >>>     >
> >     > >>>     > 在 2020/9/25 下午10:30,“Jark Wu”<imj...@gmail.com> 写入:
> >     > >>>     >
> >     > >>>     >     Hi everyone,
> >     > >>>     >
> >     > >>>     >     I want to start a FLIP about supporting windowing
> >     > table-valued
> >     > >>>     > functions
> >     > >>>     >     (TVF).
> >     > >>>     >     The main purpose of this FLIP is to improve the near
> >     > real-time
> >     > >>> (NRT)
> >     > >>>     >     experience of Flink.
> >     > >>>     >
> >     > >>>     >     FLIP-145:
> >     > >>>     >
> >     > >>>     >
> >     > >>>
> >     >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
> >     > >>>     >
> >     > >>>     >     We want to introduce TUMBLE, HOP, CUMULATE windowing
> > TVFs,
> >     > the
> >     > >>>     > CUMULATE is
> >     > >>>     >     a new kind of window.
> >     > >>>     >     With the windowing TVFs, we can support richer
> > operations on
> >     > >>> windows,
> >     > >>>     >     including window join, window TopN and so on.
> >     > >>>     >     This makes things simple: we only need to assign
> > windows at
> >     > the
> >     > >>>     > beginning
> >     > >>>     >     of the query, and then apply operations after that
> like
> >     > >>> traditional
> >     > >>>     > batch
> >     > >>>     >     SQL.
> >     > >>>     >     We hope it can help to reduce the learning curve of
> > windows,
> >     > >>> improve
> >     > >>>     > NRT
> >     > >>>     >     for Flink, and attract more batch users.
> >     > >>>     >
> >     > >>>     >     A simple code snippet for 10 minutes tumbling window
> >     > aggregate:
> >     > >>>     >
> >     > >>>     >     SELECT window_start, window_end, SUM(price)
> >     > >>>     >     FROM TABLE(
> >     > >>>     >         TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL
> > '10'
> >     > >>> MINUTES))
> >     > >>>     >     GROUP BY window_start, window_end;
> >     > >>>     >
> >     > >>>     >     I'm looking forward to your feedback.
> >     > >>>     >
> >     > >>>     >     Best,
> >     > >>>     >     Jark
> >     > >>>     >
> >     > >>>     >
> >     > >>>     >
> >     > >>>
> >     > >>>
> >     > >>>
> >     >
> >
> >
> >     --
> >
> >     Best,
> >     Benchao Li
> >
>


-- 

Best,
Benchao Li

Reply via email to