Hi all,

After some offline discussion and investigation with Timo and Danny, I have
updated the FLIP-145.

FLIP-145:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function

Here are the updates:
1. Add SESSION window syntax and examples.
2. Time Attribute: the returned value of window TVF will return 3 columns
now with additional window_time
    which is a time attribute. Add a section of "Time Attribute Propagate"
to explain how to propagate time attributes and examples.
3. The old window syntax will be deprecated. We may drop the old syntax in
the future but that needs another discussion.
4.  Add future work about simplifying TABLE() keyword (we already started
discussion in Calcite [1]) and supporting COUNT window.

Besides, we also investigated whether it is possible to use a nested type
"window(start, end, time)" instead of 3 columns.
However, there are some problems that are not possible for now.
- `window.start` can’t be selected in the group by query, because it is not
grouped.
   Postgres supports selecting nested fields for grouped ROW columns. We
can fix this in Calcite, but this isn't a trivial work.
- WINDOW is a token in the parser, can’t be used as a column name.
Otherwise, the parsing for OVER WINDOW will fail.
- Apache Beam also considered to put wstart and wend in a separate nested
row [2]. However, that would limit these extensions
  to engines supporting nested rows. Many systems don't support nested rows
well.

Therefore, we still insist on using three fields.

I would like to start a new VOTE for the updated FLIP-145 if there are no
objections.

Best,
Jark

[1]:
https://lists.apache.org/x/thread.html/ra98db08e280ddd9adeef62f456f61aedfdf7756e215cb4d66e2a52c9@%3Cdev.calcite.apache.org%3E
[2]:
https://docs.google.com/document/d/138uA7VTpbF84CFrd--cz3YVe0-AQ9ALnsavaSE2JeE4/edit?disco=AAAAHJ0EnGI


On Thu, 15 Oct 2020 at 21:03, Danny Chan <yuzhao....@gmail.com> wrote:

> Hi, Timo ~
>
> > We are not forced by
> the standard to do it as stated in the `One SQL to Rule it all` paper
>
> No, slide to the SQL standard is always better, i think this is a common
> routine of our Flink SQL now, without a standard, everyone can give a
> preference and the discussion would easily go too far apart.
>
> > We can align the SQL windows more towards our regular DataStream API
> windows, where you keyBy first and then apply a window operator.
>
> I don't think current DataStream window join implement the window
> semantics correctly, it joins the data set first then windowing the LHS and
> RHS data together, actually each input should window its data set
> separately.
>
> As for the "key by data set first", current window TVF appends just window
> attributes and thus it is very light-weight and orthorhombic, we can
> combine the window TVFs with additional joins, aggregations, TopN and so on.
>
> In SQL, people usually describe the "KEY BY" with "GROUP BY" caluse, that
> means we bind strongly the window TVF and aggregate operator together which
> i would definitely vote a -1.
>
> As for the PARTTION BY, there are specific cases for the "SESSION" window
> because a session often has a logic key there, we can extend the PARTTION
> BY syntax because it is already in the SQL standard, i'm confused why a
> Tumble window has a PARTITION key there ? What is the real use case ?
>
> -1 for "ORDER BY" because sort on un-bounded data set does not have
> meanings. For un-bounded data set we already has the watermark to handle
> the out-of-orderness data, and for bounded data set, we can use the regular
> sort here because current table argument allows any query actually.
>
> Best,
> Danny Chan
> 在 2020年10月15日 +0800 PM5:16,dev@flink.apache.org,写道:
> >
> > Personally, I find this easier to explain to users than telling them the
> > difference why a session window has SET semantic input tables and
> > tumble/sliding have ROW semantic input tables.
>

Reply via email to