I don't think it's a problem with table functions in general. And besides, we can't change the semantics of table functions. A table function must not produce duplicate column names.
The problem is with the semantics of these particular table functions - HOP, TUMBLE, SESSION - and what semantics are desirable depends on how people will typically use them. Is it common to follow TUMBLE with TUMBLE? What would a user expect to be the output columns? On Wed, Sep 23, 2020 at 11:21 AM Rui Wang <amaliu...@apache.org> wrote: > > >Is it reasonable to apply TUMBLE to TUMBLE? If so, would people > > generally want two sets of window_start, window_end columns? > > I think it is reasonable to apply TUMBLE to TUMBLE or even TUMBLE to HOP > join, as long as there is a real requirement there. The window starts/ends > are not duplicates. For example TUMBLE as L JOIN HOP as R, L offers a > window start and a window end, same for R. This is no different from a > normal JOIN case where both JOIN sides have the same column names (but they > are not considered duplicates). The SQL rule is still applying: within a > scope there shouldn't be ambiguous column names (e.g. duplicated column > name). For JOIN duplicate names from JOIN inputs are differentiated by > table alias. > > > Regarding https://issues.apache.org/jira/browse/CALCITE-4274, this is an > interesting case that is different from the JOIN case, and I also think > this is a general case (not limited to TUMBLE). > > Think about that for any query that uses table function of the pattern > in CALCITE-4274. The first table function generates column A and then it > becomes the input for the second table function, which also wants to append > a column named "A". How should Calcite handle this case? > > > -Rui > > > > On Wed, Sep 23, 2020 at 9:13 AM Julian Hyde <jh...@apache.org> wrote: > > > I think we should also discuss > > https://issues.apache.org/jira/browse/CALCITE-4274 here. > > > > We've never discussed what should happen if you apply TUMBLE to TUMBLE > > (or TUMBLE to HOP, etc.). What happens now is that you get duplicate > > columns. > > > > Is it reasonable to apply TUMBLE to TUMBLE? If so, would people > > generally want two sets of window_start, window_end columns? > > > > Julian > > > > On Wed, Sep 23, 2020 at 2:41 AM Danny Chan <yuzhao....@gmail.com> wrote: > > > > > > Thanks for the feedback, I agree we should keep the verbose part > > > > > > **L.window_start = R.window_start AND L.window_end =R.window_end** > > > > > > Which would make the semantic more clear ~ > > > > > > Best, > > > Danny Chan > > > 在 2020年9月23日 +0800 PM3:24,Viliam Durina <vil...@hazelcast.com>,写道: > > > > You can also use > > > > > > > > SELECT L.f0, R.f2, L.window_start, L.window_end > > > > FROM > > > > Tumble(table T1, descriptor(T1.ts), INTERVAL ‘5’ MINUTE) L > > > > JOIN > > > > Tumble(table T2, descriptor(T2.ts), INTERVAL ‘5’ MINUTE) R > > > > USING (f0, window_start) > > > > > > > > Viliam > > > > > > > > On Wed, 23 Sep 2020 at 08:02, Rui Wang <amaliu...@apache.org> wrote: > > > > > > > > > Regarding to **L.window_start = R.window_start AND L.window_end = > > > > > R.window_end**: > > > > > > > > > > In general, the current table function windowing model is to append > > window > > > > > metadata to table directly, thus window metadata becomes a part of > > table > > > > > (or call it data). So as a part of table, these two columns should be > > > > > treated as normal columns thus they should be in the join on > > condition. > > > > > > > > > > If you want to make it optional, it makes window start/end columns > > special > > > > > and has a semantic binding with special table functions (TUMBLE, HOP, > > > > > SESSION), which then becomes really not a SQL thing. For example, we > > can > > > > > allow users to define their own windowing table function. In that > > case, how > > > > > will you utilize window start/end produced by a customized windowing > > table > > > > > function? What if users produce wired windows that have overlapped > > window > > > > > starts or window ends? > > > > > > > > > > Keeping windows start/end as a part of the table, treating them no > > > > > different from other columns, could give a consistent behavior for > > either > > > > > built-in table function or user-defined table function. > > > > > > > > > > If you think it is too verbose, there are two options to optimize: > > > > > > > > > > 1. for TUMBLE/HOP/SESSION, to identify a unique window, you will > > only need > > > > > either window start or end, so you can simplify it, for example, to > > > > > L.window_start = R.window_start only. > > > > > 2. (not recommended), you can cut off **L.window_start = > > R.window_start AND > > > > > L.window_end = R.window_end**, but add window metadata comparison to > > join > > > > > implicitly by execution engine. E.g. you can make up the join > > condition in > > > > > your JoinRel if two inputs are TUMBLE. > > > > > > > > > > > > > > > > > > > > -Rui > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 22, 2020 at 10:27 PM Danny Chan <yuzhao....@gmail.com> > > wrote: > > > > > > > > > > > Yes, the red part is **L.window_start = R.window_start AND > > L.window_end = > > > > > > R.window_end** > > > > > > > > > > > > > Is this a limitation for "triggered by the watermark of the > > stream”? > > > > > > > > > > > > No, because in most of the cases, there is no need to output the > > > > > > intermediate/partial join records then send retractions. > > > > > > > > > > > > > > > > > > So, how do you think about the condition syntax **L.window_start = > > > > > > R.window_start AND L.window_end = R.window_end** ? > > > > > > > > > > > > Best, > > > > > > Danny Chan > > > > > > 在 2020年9月23日 +0800 PM12:47,dev@calcite.apache.org,写道: > > > > > > > > > > > > > > L.window_start = R.window_start AND L.window_end = R.window_end > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Viliam Durina > > > > Jet Developer > > > > hazelcast® > > > > > > > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA > > 94402 | > > > > USA > > > > +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com < > > https://www.hazelcast.com> > > > > > > > > -- > > > > This message contains confidential information and is intended only > > for the > > > > individuals named. If you are not the named addressee you should not > > > > disseminate, distribute or copy this e-mail. Please notify the sender > > > > immediately by e-mail if you have received this e-mail by mistake and > > > > delete this e-mail from your system. E-mail transmission cannot be > > > > guaranteed to be secure or error-free as information could be > > intercepted, > > > > corrupted, lost, destroyed, arrive late or incomplete, or contain > > viruses. > > > > The sender therefore does not accept liability for any errors or > > omissions > > > > in the contents of this message, which arise as a result of e-mail > > > > transmission. If verification is required, please request a hard-copy > > > > version. -Hazelcast > >