I think we're discussing two issues. One is "TUMBLE join TUMBLE" and the
other is "TUMBLE(TUMBLE())". In both cases the `window_start` and
`window_end` columns are duplicated.

In case of JOIN, I don't see any issue. It's equivalent to any join of two
tables, e.g.:

  SELECT *
  FROM t1
  JOIN t2 on t1.id=t2.id

You'll have two ID columns in the result, no issue here.

In case of "TUMBLE(TUMBLE())", I'd point to the feature of polymorphic
table functions in the standard:

  CREATE TABLE input_table(window_start, window_end, timestamp);

  SELECT T.window_start, T.window_end, T.timestamp, F.window_start,
F.window_end
  FROM TABLE(TUMBLE(TABLE(input_table) AS T, DESCRIPTOR(timestamp),
INTERVAL '5' MINUTES)) AS F

As you can see, you can refer to the duplicate columns using a different
alias: `T` alias is for the column in the input_table, `F` alias is for the
columns added by the function.

Viliam

On Thu, 24 Sep 2020 at 00:21, Kenneth Knowles <k...@apache.org> wrote:

> It happens in "raw Beam" somewhat often that there is one windowed
> aggregation followed by another. It is pretty unheard of to "assign"
> windows twice before an aggregation, because Beam cannot do anything useful
> with this: Beam does not have the ability to have two different
> fields/columns that are both associated with a watermark. So TUMBLE
> followed by TUMBLE is more expressive. You get a number of columns all of
> which can be used for windowed aggregation.
>
> One reason this could happen is that you are authoring a table function
> that operates generically. It does not know that its input has already been
> TUMBLED. In Beam you would reify the prior window information, do your own
> TUMBLE. Likely you would do an aggregation and then restore the prior
> window information somehow. A compelling example does not come to mind, but
> I think considering applying TUMBLE to a generic input helps clarify that
> it could happen in code worth allowing.
>
> Kenn
>
> On Wed, Sep 23, 2020 at 11:47 AM Julian Hyde <jh...@apache.org> wrote:
>
> > I don't think it's a problem with table functions in general. And
> > besides, we can't change the semantics of table functions. A table
> > function must not produce duplicate column names.
> >
> > The problem is with the semantics of these particular table functions
> > - HOP, TUMBLE, SESSION - and what semantics are desirable depends on
> > how people will typically use them. Is it common to follow TUMBLE with
> > TUMBLE? What would a user expect to be the output columns?
> >
> > On Wed, Sep 23, 2020 at 11:21 AM Rui Wang <amaliu...@apache.org> wrote:
> > >
> > > >Is it reasonable to apply TUMBLE to TUMBLE? If so, would people
> > > > generally want two sets of window_start, window_end columns?
> > >
> > > I think it is reasonable to apply TUMBLE to TUMBLE or even TUMBLE to
> HOP
> > > join, as long as there is a real requirement there. The window
> > starts/ends
> > > are not duplicates. For example TUMBLE as L JOIN HOP as R, L offers a
> > > window start and a window end, same for R. This is no different from a
> > > normal JOIN case where both JOIN sides have the same column names (but
> > they
> > > are not considered duplicates). The SQL rule is still applying: within
> a
> > > scope there shouldn't be ambiguous column names (e.g. duplicated column
> > > name). For JOIN duplicate names from JOIN inputs are differentiated by
> > > table alias.
> > >
> > >
> > > Regarding https://issues.apache.org/jira/browse/CALCITE-4274, this is
> an
> > > interesting case that is different from the JOIN case, and I also think
> > > this is a general case (not limited to TUMBLE).
> > >
> > > Think about that for any query that uses table function of the pattern
> > > in CALCITE-4274. The first table function generates column A and then
> it
> > > becomes the input for the second table function, which also wants to
> > append
> > > a column named "A". How should Calcite handle this case?
> > >
> > >
> > > -Rui
> > >
> > >
> > >
> > > On Wed, Sep 23, 2020 at 9:13 AM Julian Hyde <jh...@apache.org> wrote:
> > >
> > > > I think we should also discuss
> > > > https://issues.apache.org/jira/browse/CALCITE-4274 here.
> > > >
> > > > We've never discussed what should happen if you apply TUMBLE to
> TUMBLE
> > > > (or TUMBLE to HOP, etc.). What happens now is that you get duplicate
> > > > columns.
> > > >
> > > > Is it reasonable to apply TUMBLE to TUMBLE? If so, would people
> > > > generally want two sets of window_start, window_end columns?
> > > >
> > > > Julian
> > > >
> > > > On Wed, Sep 23, 2020 at 2:41 AM Danny Chan <yuzhao....@gmail.com>
> > wrote:
> > > > >
> > > > > Thanks for the feedback, I agree we should keep the verbose part
> > > > >
> > > > > **L.window_start = R.window_start AND L.window_end =R.window_end**
> > > > >
> > > > > Which would make the semantic more clear ~
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年9月23日 +0800 PM3:24,Viliam Durina <vil...@hazelcast.com>,写道:
> > > > > > You can also use
> > > > > >
> > > > > > SELECT L.f0, R.f2, L.window_start, L.window_end
> > > > > > FROM
> > > > > > Tumble(table T1, descriptor(T1.ts), INTERVAL ‘5’ MINUTE) L
> > > > > > JOIN
> > > > > > Tumble(table T2, descriptor(T2.ts), INTERVAL ‘5’ MINUTE) R
> > > > > > USING (f0, window_start)
> > > > > >
> > > > > > Viliam
> > > > > >
> > > > > > On Wed, 23 Sep 2020 at 08:02, Rui Wang <amaliu...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Regarding to **L.window_start = R.window_start AND
> L.window_end =
> > > > > > > R.window_end**:
> > > > > > >
> > > > > > > In general, the current table function windowing model is to
> > append
> > > > window
> > > > > > > metadata to table directly, thus window metadata becomes a part
> > of
> > > > table
> > > > > > > (or call it data). So as a part of table, these two columns
> > should be
> > > > > > > treated as normal columns thus they should be in the join on
> > > > condition.
> > > > > > >
> > > > > > > If you want to make it optional, it makes window start/end
> > columns
> > > > special
> > > > > > > and has a semantic binding with special table functions
> (TUMBLE,
> > HOP,
> > > > > > > SESSION), which then becomes really not a SQL thing. For
> > example, we
> > > > can
> > > > > > > allow users to define their own windowing table function. In
> that
> > > > case, how
> > > > > > > will you utilize window start/end produced by a customized
> > windowing
> > > > table
> > > > > > > function? What if users produce wired windows that have
> > overlapped
> > > > window
> > > > > > > starts or window ends?
> > > > > > >
> > > > > > > Keeping windows start/end as a part of the table, treating them
> > no
> > > > > > > different from other columns, could give a consistent behavior
> > for
> > > > either
> > > > > > > built-in table function or user-defined table function.
> > > > > > >
> > > > > > > If you think it is too verbose, there are two options to
> > optimize:
> > > > > > >
> > > > > > > 1. for TUMBLE/HOP/SESSION, to identify a unique window, you
> will
> > > > only need
> > > > > > > either window start or end, so you can simplify it, for
> example,
> > to
> > > > > > > L.window_start = R.window_start only.
> > > > > > > 2. (not recommended), you can cut off **L.window_start =
> > > > R.window_start AND
> > > > > > > L.window_end = R.window_end**, but add window metadata
> > comparison to
> > > > join
> > > > > > > implicitly by execution engine. E.g. you can make up the join
> > > > condition in
> > > > > > > your JoinRel if two inputs are TUMBLE.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > -Rui
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Sep 22, 2020 at 10:27 PM Danny Chan <
> > yuzhao....@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Yes, the red part is **L.window_start = R.window_start AND
> > > > L.window_end =
> > > > > > > > R.window_end**
> > > > > > > >
> > > > > > > > > Is this a limitation for "triggered by the watermark of the
> > > > stream”?
> > > > > > > >
> > > > > > > > No, because in most of the cases, there is no need to output
> > the
> > > > > > > > intermediate/partial join records then send retractions.
> > > > > > > >
> > > > > > > >
> > > > > > > > So, how do you think about the condition syntax
> > **L.window_start =
> > > > > > > > R.window_start AND L.window_end = R.window_end** ?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Danny Chan
> > > > > > > > 在 2020年9月23日 +0800 PM12:47,dev@calcite.apache.org,写道:
> > > > > > > > >
> > > > > > > > > L.window_start = R.window_start AND L.window_end =
> > R.window_end
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Viliam Durina
> > > > > > Jet Developer
> > > > > > hazelcast®
> > > > > >
> > > > > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA
> > > > 94402 |
> > > > > > USA
> > > > > > +1 (650) 521-5453 <(650)%20521-5453> <(650)%20521-5453> |
> > hazelcast.com <
> > > > https://www.hazelcast.com>
> > > > > >
> > > > > > --
> > > > > > This message contains confidential information and is intended
> only
> > > > for the
> > > > > > individuals named. If you are not the named addressee you should
> > not
> > > > > > disseminate, distribute or copy this e-mail. Please notify the
> > sender
> > > > > > immediately by e-mail if you have received this e-mail by mistake
> > and
> > > > > > delete this e-mail from your system. E-mail transmission cannot
> be
> > > > > > guaranteed to be secure or error-free as information could be
> > > > intercepted,
> > > > > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > > > viruses.
> > > > > > The sender therefore does not accept liability for any errors or
> > > > omissions
> > > > > > in the contents of this message, which arise as a result of
> e-mail
> > > > > > transmission. If verification is required, please request a
> > hard-copy
> > > > > > version. -Hazelcast
> > > >
> >
>


-- 
Viliam Durina
Jet Developer
      hazelcast®

  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
USA
+1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Reply via email to