I don't think it's a problem with table functions in general. And
besides, we can't change the semantics of table functions. A table
function must not produce duplicate column names.

The problem is with the semantics of these particular table functions
- HOP, TUMBLE, SESSION - and what semantics are desirable depends on
how people will typically use them. Is it common to follow TUMBLE with
TUMBLE? What would a user expect to be the output columns?

On Wed, Sep 23, 2020 at 11:21 AM Rui Wang <amaliu...@apache.org> wrote:
>
> >Is it reasonable to apply TUMBLE to TUMBLE? If so, would people
> > generally want two sets of window_start, window_end columns?
>
> I think it is reasonable to apply TUMBLE to TUMBLE or even TUMBLE to HOP
> join, as long as there is a real requirement there. The window starts/ends
> are not duplicates. For example TUMBLE as L JOIN HOP as R, L offers a
> window start and a window end, same for R. This is no different from a
> normal JOIN case where both JOIN sides have the same column names (but they
> are not considered duplicates). The SQL rule is still applying: within a
> scope there shouldn't be ambiguous column names (e.g. duplicated column
> name). For JOIN duplicate names from JOIN inputs are differentiated by
> table alias.
>
>
> Regarding https://issues.apache.org/jira/browse/CALCITE-4274, this is an
> interesting case that is different from the JOIN case, and I also think
> this is a general case (not limited to TUMBLE).
>
> Think about that for any query that uses table function of the pattern
> in CALCITE-4274. The first table function generates column A and then it
> becomes the input for the second table function, which also wants to append
> a column named "A". How should Calcite handle this case?
>
>
> -Rui
>
>
>
> On Wed, Sep 23, 2020 at 9:13 AM Julian Hyde <jh...@apache.org> wrote:
>
> > I think we should also discuss
> > https://issues.apache.org/jira/browse/CALCITE-4274 here.
> >
> > We've never discussed what should happen if you apply TUMBLE to TUMBLE
> > (or TUMBLE to HOP, etc.). What happens now is that you get duplicate
> > columns.
> >
> > Is it reasonable to apply TUMBLE to TUMBLE? If so, would people
> > generally want two sets of window_start, window_end columns?
> >
> > Julian
> >
> > On Wed, Sep 23, 2020 at 2:41 AM Danny Chan <yuzhao....@gmail.com> wrote:
> > >
> > > Thanks for the feedback, I agree we should keep the verbose part
> > >
> > > **L.window_start = R.window_start AND L.window_end =R.window_end**
> > >
> > > Which would make the semantic more clear ~
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年9月23日 +0800 PM3:24,Viliam Durina <vil...@hazelcast.com>,写道:
> > > > You can also use
> > > >
> > > > SELECT L.f0, R.f2, L.window_start, L.window_end
> > > > FROM
> > > > Tumble(table T1, descriptor(T1.ts), INTERVAL ‘5’ MINUTE) L
> > > > JOIN
> > > > Tumble(table T2, descriptor(T2.ts), INTERVAL ‘5’ MINUTE) R
> > > > USING (f0, window_start)
> > > >
> > > > Viliam
> > > >
> > > > On Wed, 23 Sep 2020 at 08:02, Rui Wang <amaliu...@apache.org> wrote:
> > > >
> > > > > Regarding to **L.window_start = R.window_start AND L.window_end =
> > > > > R.window_end**:
> > > > >
> > > > > In general, the current table function windowing model is to append
> > window
> > > > > metadata to table directly, thus window metadata becomes a part of
> > table
> > > > > (or call it data). So as a part of table, these two columns should be
> > > > > treated as normal columns thus they should be in the join on
> > condition.
> > > > >
> > > > > If you want to make it optional, it makes window start/end columns
> > special
> > > > > and has a semantic binding with special table functions (TUMBLE, HOP,
> > > > > SESSION), which then becomes really not a SQL thing. For example, we
> > can
> > > > > allow users to define their own windowing table function. In that
> > case, how
> > > > > will you utilize window start/end produced by a customized windowing
> > table
> > > > > function? What if users produce wired windows that have overlapped
> > window
> > > > > starts or window ends?
> > > > >
> > > > > Keeping windows start/end as a part of the table, treating them no
> > > > > different from other columns, could give a consistent behavior for
> > either
> > > > > built-in table function or user-defined table function.
> > > > >
> > > > > If you think it is too verbose, there are two options to optimize:
> > > > >
> > > > > 1. for TUMBLE/HOP/SESSION, to identify a unique window, you will
> > only need
> > > > > either window start or end, so you can simplify it, for example, to
> > > > > L.window_start = R.window_start only.
> > > > > 2. (not recommended), you can cut off **L.window_start =
> > R.window_start AND
> > > > > L.window_end = R.window_end**, but add window metadata comparison to
> > join
> > > > > implicitly by execution engine. E.g. you can make up the join
> > condition in
> > > > > your JoinRel if two inputs are TUMBLE.
> > > > >
> > > > >
> > > > >
> > > > > -Rui
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Sep 22, 2020 at 10:27 PM Danny Chan <yuzhao....@gmail.com>
> > wrote:
> > > > >
> > > > > > Yes, the red part is **L.window_start = R.window_start AND
> > L.window_end =
> > > > > > R.window_end**
> > > > > >
> > > > > > > Is this a limitation for "triggered by the watermark of the
> > stream”?
> > > > > >
> > > > > > No, because in most of the cases, there is no need to output the
> > > > > > intermediate/partial join records then send retractions.
> > > > > >
> > > > > >
> > > > > > So, how do you think about the condition syntax **L.window_start =
> > > > > > R.window_start AND L.window_end = R.window_end** ?
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2020年9月23日 +0800 PM12:47,dev@calcite.apache.org,写道:
> > > > > > >
> > > > > > > L.window_start = R.window_start AND L.window_end = R.window_end
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Viliam Durina
> > > > Jet Developer
> > > > hazelcast®
> > > >
> > > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA
> > 94402 |
> > > > USA
> > > > +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
> > https://www.hazelcast.com>
> > > >
> > > > --
> > > > This message contains confidential information and is intended only
> > for the
> > > > individuals named. If you are not the named addressee you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately by e-mail if you have received this e-mail by mistake and
> > > > delete this e-mail from your system. E-mail transmission cannot be
> > > > guaranteed to be secure or error-free as information could be
> > intercepted,
> > > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > viruses.
> > > > The sender therefore does not accept liability for any errors or
> > omissions
> > > > in the contents of this message, which arise as a result of e-mail
> > > > transmission. If verification is required, please request a hard-copy
> > > > version. -Hazelcast
> >

Reply via email to