Re: [DISCUSS] FLIP-145: Support SQL windowing table-valued function

Jark Wu Sat, 10 Oct 2020 03:00:32 -0700

Hi everyone,

Thanks everyone for this healthy discussion. I think we have addressed all
the concerns. I would continue with a voting.
If you have any new objections, feel free to let me know.


Best,
Jark

On Sat, 10 Oct 2020 at 17:54, Jark Wu <imj...@gmail.com> wrote:

> Hi Jingsong,
>
> That's a good question. I did have searched a lot and didn't find any
> system that provides such an out-of-box function.
> I guess the reason is that in the traditional batch systems, this feature
> is supported by the over window and they don't need to invent a
> new function/syntax for this.
> For streaming systems, we are the first one to propose this new window.
>
> However, I think CUMULATE is a good name. Because almost all the databases
> call such scenarios as "cumulative window", e.g. Snowflake[1], SQL Server
> [2], Postgres [3].
> Thus we choose "cumulative" as the base name, but use the verb form
> "cumulate" because other window function names are also verbs, e.g. tumble,
> hop.
>
> I hope this can address your concern.
>
> Best,
> Jark
>
> [1]:
> https://docs.snowflake.com/en/sql-reference/functions-analytic.html#cumulative-window-frame-examples
> [2]:
> https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15#c-producing-a-moving-average-and-cumulative-total
> [3]:
> https://popsql.com/learn-sql/postgresql/how-to-calculate-cumulative-sum-running-total-in-postgresql
>
> On Sat, 10 Oct 2020 at 17:26, Jingsong Li <jingsongl...@gmail.com> wrote:
>
>> +1 for voting. Thanks Jark for driving.
>>
>> +1 for TVF, It has been put forward by theory and supported by calcite. It
>> will greatly enhance the window related operations.
>>
>> My personal feeling is that after TVF, the following operations can be
>> similar to the traditional batch SQL, as long as the window related
>> attributes are included in the key.
>>
>> I am not sure about the CUMULATE window, yes, It's a common requirement,
>> Is
>> there any more evidence (other systems) to prove this word ("CUMULATE") is
>> appropriate.
>>
>> Best,
>> Jingsong
>>
>> On Sat, Oct 10, 2020 at 3:43 PM Jark Wu <imj...@gmail.com> wrote:
>>
>> > Hi Pengcheng,
>> >
>> > IIUC, the "stream operators" you mean is the non-time operators or
>> called
>> > regular operators, such as regular join, regular aggregate.
>> > But you may misunderstand me, only the time operators can't be applied
>> > after the new window operators, because of missing time attributes.
>> > The regular operators can still be applied after the new window
>> operators.
>> >
>> > Regarding using window TVFs to re-assign event-time and watermarks, I'm
>> not
>> > sure about this.
>> > Because assigning watermark requires to define the watermark strategy,
>> > however, the window TVF doesn't provide such ability.
>> > Polymorphic table functions are table functions which just append
>> > additional columns and convert N rows into M rows, it can't touch meta
>> > information.
>> >
>> > Best,
>> > Jark
>> >
>> > On Sat, 10 Oct 2020 at 15:41, Jark Wu <imj...@gmail.com> wrote:
>> >
>> > > Hi Danny,
>> > >
>> > > Thanks for the hint about named params syntax, I added examples with
>> > named
>> > > params in the FLIP.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > >
>> > > On Sat, 10 Oct 2020 at 15:03, Pengcheng Liu <
>> pengchengliucr...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > >> Hi, Jark,
>> > >>
>> > >>    I've got some different opinions there, I think it's a very common
>> > use
>> > >> case to use
>> > >>    window operators in combination with streaming operators(even
>> those
>> > >> time operators).
>> > >>    (e.g. for some tables, users only care data within a period, but
>> for
>> > >> other tables, they may
>> > >>    want the whole historical data).
>> > >>    The pipeline may looks like this:
>> > >>    window join -> dimension table join -> stream aggregate -> stream
>> > sort
>> > >>
>> > >>    Just as what you said, the key clause can be used to distinguish
>> > >> whether a operator should
>> > >>    be translated to a window operator or a streaming operator.
>> > >>
>> > >>    Also, as I've mentioned before, 1) for time operator after window
>> > >> aggregation, the auxiliary function
>> > >>    which is used to access time attribute column can be actually
>> > replaced
>> > >> with (window_end -1).
>> > >>    Actually, we only just need to make the results of the upstream
>> > >> contains a time column whose
>> > >>    range is within (window_start, window_end), and thus the
>> downstream
>> > >> time operators can work on it
>> > >>    (driving by the original watermark in the source). 2) for time
>> > >> operator after other window operators,
>> > >>    the downstream time operators can access the time column directly
>> > from
>> > >> it's input.
>> > >>
>> > >>    One more thoughts there, maybe the window TVFs can re-assign
>> > >> timestamps and watermarks, so
>> > >>    that in some case when the watermark can not be retrieved from
>> source
>> > >> directly(may needs some
>> > >>    conversions), the watermark can still be assigned dynamically in
>> the
>> > >> SQL(use the time column as
>> > >>    the watermark column) and thus make it work. I think this can save
>> > >> much time to revise the event
>> > >>    time column in some cases(this is a real demand in our production
>> > >> environment).
>> > >>
>> > >>    I strongly suggest that we should support the combination usage of
>> > >> window operators and
>> > >>    streaming operators. And I think we can achieve this with little
>> > work.
>> > >>
>> > >> Best,
>> > >> Pengcheng
>> > >>
>> > >>
>> > >> Jark Wu <imj...@gmail.com> 于2020年10月10日周六 下午1:45写道：
>> > >>
>> > >>> Hi Benchao,
>> > >>>
>> > >>> That's a good question.
>> > >>>
>> > >>> IMO, the new windowed operators and the current time operators are
>> two
>> > >>> different sets of functions,
>> > >>> just like time operators and non-time operators are two different
>> sets
>> > of
>> > >>> functions.
>> > >>> I think it's fine if we don't support integrating them, just like
>> time
>> > >>> operators can't be applied on non-windowed aggregate.
>> > >>> If users want to use time operators in the whole pipeline, then
>> he/she
>> > >>> can
>> > >>> use the grouped window aggregates instead of the window TVFs.
>> > >>>
>> > >>> The key idea of window TVF is that all the operators in the pipeline
>> > are
>> > >>> based on the **windows**.
>> > >>> In terms of syntax, if the key clause (e.g. group by, partitioned
>> by,
>> > >>> join
>> > >>> on, order by) contains window_start and window_end,
>> > >>> it can be translated into windowed operators.
>> > >>> Thus, we will have windowed CEP, windowed sort, windowed over
>> aggregate
>> > >>> in
>> > >>> the future to make it possible to build a windowed pipeline.
>> > >>>
>> > >>> But I think we can elaborate the integration more in the future if
>> > users
>> > >>> need it. Actually, I don't fully understand the scenario of
>> integrating
>> > >>> window TVF and time operators at this point.
>> > >>> For example, interval join an input stream and a window join
>> result. I
>> > >>> don't see why it can't be expressed by nested window join and why
>> users
>> > >>> have to use interval join here.
>> > >>> Maybe we can wait for more inputs from users when the window TVF is
>> > >>> released and we can elaborate it again.
>> > >>>
>> > >>> Best,
>> > >>> Jark
>> > >>>
>> > >>> On Sat, 10 Oct 2020 at 12:01, 刘 芃成 <pengchengliucr...@gmail.com>
>> > wrote:
>> > >>>
>> > >>> > Hi, Benchao,
>> > >>> >        I think I got your point, actually, in current
>> implementation
>> > >>> for
>> > >>> > group window aggregation, the value of time attributes(e.g.
>> > >>> > TUMBLE_ROWTIME/TUMBLE_PROCTIME) is calculated as (window_end – 1),
>> > so I
>> > >>> > think we can just use it directly if you need this. But I think
>> this
>> > >>> time
>> > >>> > attributes is mainly suggested to use in case of cascaded window
>> > >>> operations.
>> > >>> > Regarding the example you provided, I think the semantics of the
>> SQL
>> > in
>> > >>> > your example which doing interval join(e.g. with TUMBLE_ROWTIME)
>> > after
>> > >>> > window aggregation is not clear in the current implementation,
>> and I
>> > >>> think
>> > >>> > that’s a strong reason why we need the new TVFs syntax.
>> > >>> >       With the new syntax, users should understand which time
>> column
>> > to
>> > >>> > use and how to generate it when doing interval join and etc.
>> > >>> >
>> > >>> > Best,
>> > >>> > Pengcheng
>> > >>> >
>> > >>> > 发件人: Benchao Li <libenc...@apache.org>
>> > >>> > 日期: 2020年10月10日 星期六 上午11:02
>> > >>> > 收件人: pengcheng Liu <pengchengliucr...@gmail.com>
>> > >>> > 抄送: dev <dev@flink.apache.org>
>> > >>> > 主题: Re: [DISCUSS] FLIP-145: Support SQL windowing table-valued
>> > function
>> > >>> >
>> > >>> > Hi pengcheng,
>> > >>> >
>> > >>> > Thanks for your response.
>> > >>> > I knew that the original time attribute column will be retained
>> after
>> > >>> the
>> > >>> > TVF,
>> > >>> > what I'm questioning is how do we get the time attribute column
>> after
>> > >>> > Aggregation.
>> > >>> > Your answer did not remove my doubts about this.
>> > >>> >
>> > >>> > It's ok if we did not plan to integrate new TVF aggregate with old
>> > >>> "time
>> > >>> > attribute scenarios"
>> > >>> > listed in my previous email in this FLIP. However it's good to
>> > >>> elaborate
>> > >>> > more on this, and
>> > >>> > leave it to the future plan.
>> > >>> >
>> > >>> > pengcheng Liu <pengchengliucr...@gmail.com<mailto:
>> > >>> > pengchengliucr...@gmail.com>> 于2020年10月10日周六 上午10:45写道：
>> > >>> > Hi，Benchao,
>> > >>> >     In TVFs, the time attributes is just passed through from
>> parent
>> > >>> rels,
>> > >>> > and the TVFs just add two
>> > >>> >     additional window attributes(i.e. window_start & window_end).
>> > >>> Also, I
>> > >>> > think the time columns can be not only a time attribute
>> > >>> >     with type of `TimeIndicatorType` but also a regular column
>> with
>> > >>> type
>> > >>> > of `Timestamp`.
>> > >>> >
>> > >>> >     For cascaded window operations, we can use
>> > window_start/window_end
>> > >>> of
>> > >>> > the previous window result directly to
>> > >>> >     indicate operating on the same window, or use  new DESCRIPTOR
>> > >>> column
>> > >>> > to assign new windows, in case of the change of
>> > >>> >     the time column(e.g. in some case, the original timestamp is
>> > >>> > inaccurate and need some conversion to be used).
>> > >>> >
>> > >>> >     You can check the definition or signature of these TVFs in the
>> > >>> FLIP.
>> > >>> >      e.g.
>> > >>> >   SELECT * FROM TABLE(
>> > >>> >    TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
>> > >>> >     In the example, the `bidtime` is the time attribute column,
>> which
>> > >>> is
>> > >>> > the first operand of the DESCRIPTOR function.
>> > >>> >
>> > >>> >     +1 start voting.
>> > >>> >
>> > >>> > Benchao Li <libenc...@apache.org<mailto:libenc...@apache.org>>
>> > >>> > 于2020年10月10日周六 上午10:08写道：
>> > >>> > Hi Jark,
>> > >>> >
>> > >>> > 2 & 3 sounds good to me.
>> > >>> >
>> > >>> > Regarding time attribute,
>> > >>> > I still have some questions, I knew it's easy to support cascaded
>> > >>> window
>> > >>> > aggregate using new TVFs.
>> > >>> > However there are some other places where need time attribute:
>> > >>> > - CEP
>> > >>> > - interval join
>> > >>> > - order by
>> > >>> > - over window
>> > >>> > If there is no time attribute column, how do we integrate these
>> old
>> > >>> > features with the new TVFs.
>> > >>> > E.g.
>> > >>> > StreamA -> new window aggregate -> interval join -> Sink
>> > >>> >                                                          /
>> > >>> > StreamB -----------------------------------
>> > >>> >
>> > >>> >
>> > >>> > Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五
>> > >>> > 下午11:51写道：
>> > >>> > Hi Benchao,
>> > >>> >
>> > >>> > 1) time attribute
>> > >>> > Yes. We don't need time attribute auxiliary function. Because the
>> new
>> > >>> > window operations are all based on the
>> > >>> >  window_start and window_end columns instead of on the time
>> > >>> attributes. So
>> > >>> > we don't need to propagate time attributes.
>> > >>> > Cascaded window aggregate can be expressed by simply GROUP BY the
>> > >>> > window_start and window_end of the previous window result.
>> > >>> > I have added a cascaded window aggregate example in the Tumbling
>> > Window
>> > >>> > section in the FLIP.
>> > >>> > If you want to define proctime window aggregate, the time column
>> in
>> > TVF
>> > >>> > should be a proctime attribute field (or PROCTIME() function).
>> > >>> >
>> > >>> > 2) batch support
>> > >>> > Yes. The proposed syntax/API are unified for batch and streaming.
>> > Batch
>> > >>> > support is in the plan, but may not have enough time to catch up
>> > 1.12.
>> > >>> >
>> > >>> > 3) support `grouping sets`
>> > >>> > This is not included in the FLIP, but I think it's great if we can
>> > >>> support
>> > >>> > `grouping sets`.
>> > >>> > The existing window impl doesn't support this because we convert
>> the
>> > >>> > LogicalAggregate into WindowAggregate in the beginning,
>> > >>> > the expand grouping sets rule can't be applied in this situation.
>> > >>> > Fortunately, with the new window impl, the conversion to
>> > >>> WindowAggregate
>> > >>> > will happen at the end, so I think the expand rule can be
>> > >>> >  applied and support this feature naturally.
>> > >>> > Therefore, IMO, we don't need to include this feature in this
>> FLIP to
>> > >>> avoid
>> > >>> > the FLIP being too large.
>> > >>> > This can be a follow-up issue (maybe just add tests and docs)
>> after
>> > the
>> > >>> > FLIP.
>> > >>> >
>> > >>> > Best,
>> > >>> > Jark
>> > >>> >
>> > >>> >
>> > >>> > On Fri, 9 Oct 2020 at 19:09, 刘 芃成 <pengchengliucr...@gmail.com
>> > <mailto:
>> > >>> > pengchengliucr...@gmail.com>> wrote:
>> > >>> >
>> > >>> > > Hi，Benchao,
>> > >>> > >         Welcome to join the discussion, yes, this new syntax can
>> > >>> make SQL
>> > >>> > > more clear and simpler.
>> > >>> > >         For your first question, the `window_start` and
>> > `window_end`
>> > >>> > > columns will be added automatically,
>> > >>> > >         so we don't need to use auxiliary group functions to
>> infer
>> > or
>> > >>> > > access the window properties.
>> > >>> > >
>> > >>> > >         For the `grouping sets` on TVFs, I think it's
>> interesting
>> > if
>> > >>> we
>> > >>> > > can support it, as we already supported `grouping sets`
>> > >>> > >         on streaming aggregates in blink planner. But I'm not
>> sure
>> > >>> if it
>> > >>> > > will be included into this FLIP.
>> > >>> > >
>> > >>> > >         cc @Jark Wu
>> > >>> > >
>> > >>> > > Best,
>> > >>> > > Pengcheng
>> > >>> > >
>> > >>> > >
>> > >>> > > 在 2020/10/9 下午5:25，“Benchao Li”<libenc...@apache.org<mailto:
>> > >>> > libenc...@apache.org>> 写入:
>> > >>> > >
>> > >>> > >     Thanks Jark for bringing this discussion, I like this FLIP
>> very
>> > >>> much.
>> > >>> > >
>> > >>> > >     Especially the cumulate window, it's much like the current
>> > TUMBLE
>> > >>> > > window +
>> > >>> > >     Fast Emit (which is an undocumented experimental feature),
>> > >>> however,
>> > >>> > > it's
>> > >>> > >     more powerful.
>> > >>> > >
>> > >>> > >     And This will make the SQL semantic more standard,
>> especially
>> > >>> for the
>> > >>> > >     HOPPING window.
>> > >>> > >
>> > >>> > >     Regarding time attribute,
>> > >>> > >     It seems that we don't need a specific function to infer the
>> > time
>> > >>> > > attribute
>> > >>> > >     like
>> > >>> > >     `TUMBLE_ROWTIME` / `TUMBLE_PROCTIME`. Then are
>> `window_start`
>> > and
>> > >>> > >     `window_end`
>> > >>> > >     column a time attribute column automatically?
>> > >>> > >     - If not, what will be the time attribute of the result
>> > relation
>> > >>> of
>> > >>> > > these
>> > >>> > >     TVFs?
>> > >>> > >       Especially after the window aggregation.
>> > >>> > >     - If yes, then how do we handle proctime?
>> > >>> > >
>> > >>> > >     Regarding batch operators,
>> > >>> > >     It's great to hear that we can reuse the batch operators in
>> > >>> > continuous
>> > >>> > >     batch mode
>> > >>> > >     as you mentioned in the FLIP.
>> > >>> > >     Current window aggregate could also be used in batch mode
>> with
>> > >>> > > rowtime. Do
>> > >>> > >     you plan
>> > >>> > >     to support these TVFs for batch mode in this FLIP? Hence the
>> > >>> > Table/SQL
>> > >>> > > is a
>> > >>> > >     unified
>> > >>> > >     API, it's great if we can keep the features complete both in
>> > >>> > streaming
>> > >>> > > and
>> > >>> > >     batch mode.
>> > >>> > >
>> > >>> > >     There is one more question, I don't know whether it should
>> be
>> > >>> > > considered in
>> > >>> > >     this FLIP.
>> > >>> > >     Does the new window support `grouping sets`? (It's not
>> > supported
>> > >>> in
>> > >>> > old
>> > >>> > >     window impl).
>> > >>> > >
>> > >>> > >     Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>>
>> > >>> 于2020年10月9日周五
>> > >>> > 下午4:14写道：
>> > >>> > >
>> > >>> > >     > Hi all,
>> > >>> > >     >
>> > >>> > >     > I know we have a lot of discussion and development on
>> going
>> > >>> right
>> > >>> > > now but
>> > >>> > >     > it would be great if we can get FLIP-145 into a votable
>> > state.
>> > >>> > >     > If there are no objections, I would like to start voting
>> in
>> > the
>> > >>> > next
>> > >>> > > days.
>> > >>> > >     >
>> > >>> > >     > Best,
>> > >>> > >     > Jark
>> > >>> > >     >
>> > >>> > >     > On Thu, 1 Oct 2020 at 14:29, Jark Wu <imj...@gmail.com
>> > <mailto:
>> > >>> > imj...@gmail.com>> wrote:
>> > >>> > >     >
>> > >>> > >     > > Hi everyone,
>> > >>> > >     > >
>> > >>> > >     > > I have added a section for Performance Optimization to
>> > >>> describe
>> > >>> > > how to
>> > >>> > >     > > improve the performance in the short-term and long-term
>> > >>> > >     > > and sketch the future performance potential under the
>> new
>> > >>> window
>> > >>> > > API.
>> > >>> > >     > > Introducing the window API is just the first step, we
>> will
>> > >>> > >     > > continuously improve the performance to make it powerful
>> > and
>> > >>> > > useful.
>> > >>> > >     > >
>> > >>> > >     > > Best,
>> > >>> > >     > > Jark
>> > >>> > >     > >
>> > >>> > >     > > On Thu, 1 Oct 2020 at 14:28, Jark Wu <imj...@gmail.com
>> > >>> <mailto:
>> > >>> > imj...@gmail.com>> wrote:
>> > >>> > >     > >
>> > >>> > >     > >> Hi Pengcheng,
>> > >>> > >     > >>
>> > >>> > >     > >> Yes, the window TVF is part of the FLIP. Welcome to
>> > >>> contribute
>> > >>> > > and join
>> > >>> > >     > >> the discussion.
>> > >>> > >     > >> Regarding the SESSION window aggregation, users can use
>> > the
>> > >>> > > existing
>> > >>> > >     > >> grouped session window function.
>> > >>> > >     > >>
>> > >>> > >     > >> Best,
>> > >>> > >     > >> Jark
>> > >>> > >     > >>
>> > >>> > >     > >> On Sun, 27 Sep 2020 at 21:24, liupengcheng <
>> > >>> > > pengchengliucr...@gmail.com<mailto:pengchengliucr...@gmail.com>
>> > >>> > >     > >
>> > >>> > >     > >> wrote:
>> > >>> > >     > >>
>> > >>> > >     > >>> Hi Jark,
>> > >>> > >     > >>>         Thanks for reply, yes, I think it's a good
>> > >>> feature, it
>> > >>> > > can
>> > >>> > >     > >>> improve the NRT scenarios
>> > >>> > >     > >>>         as you mentioned in the FLIP. Also, I think it
>> > can
>> > >>> > > improve the
>> > >>> > >     > >>> streaming SQL greatly,
>> > >>> > >     > >>>         it can support richer window operations in
>> flink
>> > >>> SQL
>> > >>> > and
>> > >>> > > bring
>> > >>> > >     > >>> great convenience to users.
>> > >>> > >     > >>>         (we are now only supported group window in
>> > flink).
>> > >>> > >     > >>>
>> > >>> > >     > >>>         Regarding the SESSION window, I think it's
>> > >>> especially
>> > >>> > > useful
>> > >>> > >     > for
>> > >>> > >     > >>> user behavior analysis(e.g.
>> > >>> > >     > >>>         counting user visits on a news website or
>> social
>> > >>> > > platform), but
>> > >>> > >     > >>> I agree that we can keep it
>> > >>> > >     > >>>         out of the FLIP now to catch up 1.12.
>> > >>> > >     > >>>
>> > >>> > >     > >>>         Recently, I've done some work on the stream
>> > planner
>> > >>> > with
>> > >>> > > the
>> > >>> > >     > >>> TVFs, and I'm willing to contribute
>> > >>> > >     > >>>         to this part. Is it in the plan of this FLIP?
>> > >>> > >     > >>>
>> > >>> > >     > >>>         Best,
>> > >>> > >     > >>>         PengchengLiu
>> > >>> > >     > >>>
>> > >>> > >     > >>>
>> > >>> > >     > >>> 在 2020/9/26 下午11:09，“Jark Wu”<imj...@gmail.com
>> <mailto:
>> > >>> > imj...@gmail.com>> 写入:
>> > >>> > >     > >>>
>> > >>> > >     > >>>     Hi pengcheng,
>> > >>> > >     > >>>
>> > >>> > >     > >>>     That's great to see you also have the need of
>> window
>> > >>> join.
>> > >>> > >     > >>>     You are right, the windowing TVF is a powerful
>> > feature
>> > >>> > which
>> > >>> > > can
>> > >>> > >     > >>> support
>> > >>> > >     > >>>     more operations in the future.
>> > >>> > >     > >>>     I think it as of the date time "partition"
>> selection
>> > in
>> > >>> > > batch SQL
>> > >>> > >     > >>> jobs,
>> > >>> > >     > >>>     with this new syntax, I think it is possible
>> > >>> > >     > >>>      to migrate traditional batch SQL jobs to Flink
>> SQL
>> > by
>> > >>> > > changing a
>> > >>> > >     > >>> few lines.
>> > >>> > >     > >>>
>> > >>> > >     > >>>     Regarding the SESSION window, this is on purpose
>> to
>> > >>> keep it
>> > >>> > > out of
>> > >>> > >     > >>> the
>> > >>> > >     > >>>     FLIP, because we want to keep the
>> > >>> > >     > >>>     FLIP small to catch up 1.12 and SESSION TVF is
>> rarely
>> > >>> > useful
>> > >>> > > (e.g.
>> > >>> > >     > >>> session
>> > >>> > >     > >>>     window join?).
>> > >>> > >     > >>>
>> > >>> > >     > >>>     Best,
>> > >>> > >     > >>>     Jark
>> > >>> > >     > >>>
>> > >>> > >     > >>>     On Fri, 25 Sep 2020 at 22:59, liupengcheng <
>> > >>> > >     > >>> pengchengliucr...@gmail.com<mailto:
>> > >>> pengchengliucr...@gmail.com
>> > >>> > >>
>> > >>> > >     > >>>     wrote:
>> > >>> > >     > >>>
>> > >>> > >     > >>>     > Hi, Jark,
>> > >>> > >     > >>>     >         I'm very interested in this feature, and
>> > I'm
>> > >>> also
>> > >>> > > working
>> > >>> > >     > >>> on this
>> > >>> > >     > >>>     > recently.
>> > >>> > >     > >>>     >         I just have a glance at the FLIP, it's
>> > good,
>> > >>> but
>> > >>> > I
>> > >>> > > found
>> > >>> > >     > >>> that
>> > >>> > >     > >>>     > there is no plan to add SESSION windows.
>> > >>> > >     > >>>     >         Also, I think there can be more things
>> we
>> > >>> can do
>> > >>> > > based on
>> > >>> > >     > >>> this new
>> > >>> > >     > >>>     > syntax. For example,
>> > >>> > >     > >>>     >         - window sort support
>> > >>> > >     > >>>     >         - window union/intersect/minus support
>> > >>> > >     > >>>     >         - Improve dimension table join
>> > >>> > >     > >>>     >         We can have more deep discussion on this
>> > new
>> > >>> > > feature
>> > >>> > >     > later
>> > >>> > >     > >>> .
>> > >>> > >     > >>>     >         I've also opened an jira that is
>> related to
>> > >>> this
>> > >>> > > feature
>> > >>> > >     > >>> recently:
>> > >>> > >     > >>>     >
>> https://issues.apache.org/jira/browse/FLINK-18830
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     > Best!
>> > >>> > >     > >>>     > PengchengLiu
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     > 在 2020/9/25 下午10:30，“Jark Wu”<imj...@gmail.com
>> > >>> <mailto:
>> > >>> > imj...@gmail.com>> 写入:
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     Hi everyone,
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     I want to start a FLIP about supporting
>> > windowing
>> > >>> > >     > table-valued
>> > >>> > >     > >>>     > functions
>> > >>> > >     > >>>     >     (TVF).
>> > >>> > >     > >>>     >     The main purpose of this FLIP is to improve
>> the
>> > >>> near
>> > >>> > >     > real-time
>> > >>> > >     > >>> (NRT)
>> > >>> > >     > >>>     >     experience of Flink.
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     FLIP-145:
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>
>> > >>> > >     >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     We want to introduce TUMBLE, HOP, CUMULATE
>> > >>> windowing
>> > >>> > > TVFs,
>> > >>> > >     > the
>> > >>> > >     > >>>     > CUMULATE is
>> > >>> > >     > >>>     >     a new kind of window.
>> > >>> > >     > >>>     >     With the windowing TVFs, we can support
>> richer
>> > >>> > > operations on
>> > >>> > >     > >>> windows,
>> > >>> > >     > >>>     >     including window join, window TopN and so
>> on.
>> > >>> > >     > >>>     >     This makes things simple: we only need to
>> > assign
>> > >>> > > windows at
>> > >>> > >     > the
>> > >>> > >     > >>>     > beginning
>> > >>> > >     > >>>     >     of the query, and then apply operations
>> after
>> > >>> that
>> > >>> > like
>> > >>> > >     > >>> traditional
>> > >>> > >     > >>>     > batch
>> > >>> > >     > >>>     >     SQL.
>> > >>> > >     > >>>     >     We hope it can help to reduce the learning
>> > curve
>> > >>> of
>> > >>> > > windows,
>> > >>> > >     > >>> improve
>> > >>> > >     > >>>     > NRT
>> > >>> > >     > >>>     >     for Flink, and attract more batch users.
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     A simple code snippet for 10 minutes
>> tumbling
>> > >>> window
>> > >>> > >     > aggregate:
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     SELECT window_start, window_end, SUM(price)
>> > >>> > >     > >>>     >     FROM TABLE(
>> > >>> > >     > >>>     >         TUMBLE(TABLE Bid, DESCRIPTOR(bidtime),
>> > >>> INTERVAL
>> > >>> > > '10'
>> > >>> > >     > >>> MINUTES))
>> > >>> > >     > >>>     >     GROUP BY window_start, window_end;
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     I'm looking forward to your feedback.
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >     Best,
>> > >>> > >     > >>>     >     Jark
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>     >
>> > >>> > >     > >>>
>> > >>> > >     > >>>
>> > >>> > >     > >>>
>> > >>> > >     >
>> > >>> > >
>> > >>> > >
>> > >>> > >     --
>> > >>> > >
>> > >>> > >     Best,
>> > >>> > >     Benchao Li
>> > >>> > >
>> > >>> >
>> > >>> >
>> > >>> > --
>> > >>> >
>> > >>> > Best,
>> > >>> > Benchao Li
>> > >>> >
>> > >>> >
>> > >>> > --
>> > >>> >
>> > >>> > Best,
>> > >>> > Benchao Li
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>>
>> --
>> Best, Jingsong Lee
>>
>

Re: [DISCUSS] FLIP-145: Support SQL windowing table-valued function

Reply via email to