Re: Towards a spec for robust streaming SQL, Part 1

Fabian Hueske Tue, 25 Apr 2017 00:07:22 -0700

Hi Tyler,

thanks for pushing this effort and including the Flink list.
I haven't managed to read the doc yet, but just wanted to thank you for the
write-up and let you know that I'm very interested in this discussion.


We are very close to the feature freeze of Flink 1.3 and I'm quite busy
getting as many contributions merged before the release is forked off.
When that happened, I'll have more time to read and comment.

Thanks,
Fabian


2017-04-22 0:16 GMT+02:00 Tyler Akidau <[email protected]>:

> Good point, when you start talking about anything less than a full join,
> triggers get involved to describe how one actually achieves the desired
> semantics, and they may end up being tied to just one of the inputs (e.g.,
> you may only care about the watermark for one side of the join). Am
> expecting us to address these sorts of details more precisely in doc #2.
>
> -Tyler
>
> On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles <[email protected]>
> wrote:
>
> > There's something to be said about having different triggering depending
> on
> > which side of a join data comes from, perhaps?
> >
> > (delightful doc, as usual)
> >
> > Kenn
> >
> > On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau <[email protected]
> >
> > wrote:
> >
> > > Thanks for reading, Luke. The simple answer is that CoGBK is basically
> > > flatten + GBK. Flatten is a non-grouping operation that merges the
> input
> > > streams into a single output stream. GBK then groups the data within
> that
> > > single union stream as you might otherwise expect, yielding a single
> > table.
> > > So I think it doesn't really impact things much. Grouping, aggregation,
> > > window merging etc all just act upon the merged stream and generate
> what
> > is
> > > effectively a merged table.
> > >
> > > -Tyler
> > >
> > > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik <[email protected]
> >
> > > wrote:
> > >
> > > > The doc is a good read.
> > > >
> > > > I think you do a great job of explaining table -> stream, stream ->
> > > stream,
> > > > and stream -> table when there is only one stream.
> > > > But when there are multiple streams reading/writing to a table, how
> > does
> > > > that impact what occurs?
> > > > For example, with CoGBK you have multiple streams writing to a table,
> > how
> > > > does that impact window merging?
> > > >
> > > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau
> > <[email protected]
> > > >
> > > > wrote:
> > > >
> > > > > Hello Beam, Calcite, and Flink dev lists!
> > > > >
> > > > > Apologies for the big cross post, but I thought this might be
> > something
> > > > all
> > > > > three communities would find relevant.
> > > > >
> > > > > Beam is finally making progress on a SQL DSL utilizing Calcite,
> > thanks
> > > to
> > > > > Mingmin Xu. As you can imagine, we need to come to some conclusion
> > > about
> > > > > how to elegantly support the full suite of streaming functionality
> in
> > > the
> > > > > Beam model in via Calcite SQL. You folks in the Flink community
> have
> > > been
> > > > > pushing on this (e.g., adding windowing constructs, amongst others,
> > > thank
> > > > > you! :-), but from my understanding we still don't have a full spec
> > for
> > > > how
> > > > > to support robust streaming in SQL (including but not limited to,
> > > e.g., a
> > > > > triggers analogue such as EMIT).
> > > > >
> > > > > I've been spending a lot of time thinking about this and have some
> > > > opinions
> > > > > about how I think it should look that I've already written down,
> so I
> > > > > volunteered to try to drive forward agreement on a general
> streaming
> > > SQL
> > > > > spec between our three communities (well, technically I volunteered
> > to
> > > do
> > > > > that w/ Beam and Calcite, but I figured you Flink folks might want
> to
> > > > join
> > > > > in since you're going that direction already anyway and will have
> > > useful
> > > > > insights :-).
> > > > >
> > > > > My plan was to do this by sharing two docs:
> > > > >
> > > > >    1. The Beam Model : Streams & Tables - This one is for context,
> > and
> > > > >    really only mentions SQL in passing. But it describes the
> > > relationship
> > > > >    between the Beam Model and the "streams & tables" way of
> thinking,
> > > > which
> > > > >    turns out to be useful in understanding what robust streaming in
> > SQL
> > > > > might
> > > > >    look like. Many of you probably already know some or all of
> what's
> > > in
> > > > > here,
> > > > >    but I felt it was necessary to have it all written down in order
> > to
> > > > > justify
> > > > >    some of the proposals I wanted to make in the second doc.
> > > > >
> > > > >    2. A streaming SQL spec for Calcite - The goal for this doc is
> > that
> > > it
> > > > >    would become a general specification for what robust streaming
> SQL
> > > in
> > > > >    Calcite should look like. It would start out as a basic proposal
> > of
> > > > what
> > > > >    things *could* look like (combining both what things look like
> now
> > > as
> > > > > well
> > > > >    as a set of proposed changes for the future), and we could all
> > > iterate
> > > > > on
> > > > >    it together until we get to something we're happy with.
> > > > >
> > > > > At this point, I have doc #1 ready, and it's a bit of a monster,
> so I
> > > > > figured I'd share it and let folks hack at it with comments if they
> > > have
> > > > > any, while I try to get the second doc ready in the meantime. As
> part
> > > of
> > > > > getting doc #2 ready, I'll be starting a separate thread to try to
> > > gather
> > > > > input on what things are already in flight for streaming SQL across
> > the
> > > > > various communities, to make sure the proposal captures everything
> > > that's
> > > > > going on as accurately as it can.
> > > > >
> > > > > If you have any questions or comments, I'm interested to hear them.
> > > > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables":
> > > > >
> > > > >   http://s.apache.org/beam-streams-tables
> > > > >
> > > > > -Tyler
> > > > >
> > > >
> > >
> >
>

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to