Re: Towards a spec for robust streaming SQL, Part 1

Tyler Akidau Fri, 21 Apr 2017 15:17:41 -0700

Good point, when you start talking about anything less than a full join,
triggers get involved to describe how one actually achieves the desired
semantics, and they may end up being tied to just one of the inputs (e.g.,
you may only care about the watermark for one side of the join). Am
expecting us to address these sorts of details more precisely in doc #2.


-Tyler

On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles <[email protected]>
wrote:

> There's something to be said about having different triggering depending on
> which side of a join data comes from, perhaps?
>
> (delightful doc, as usual)
>
> Kenn
>
> On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau <[email protected]>
> wrote:
>
> > Thanks for reading, Luke. The simple answer is that CoGBK is basically
> > flatten + GBK. Flatten is a non-grouping operation that merges the input
> > streams into a single output stream. GBK then groups the data within that
> > single union stream as you might otherwise expect, yielding a single
> table.
> > So I think it doesn't really impact things much. Grouping, aggregation,
> > window merging etc all just act upon the merged stream and generate what
> is
> > effectively a merged table.
> >
> > -Tyler
> >
> > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik <[email protected]>
> > wrote:
> >
> > > The doc is a good read.
> > >
> > > I think you do a great job of explaining table -> stream, stream ->
> > stream,
> > > and stream -> table when there is only one stream.
> > > But when there are multiple streams reading/writing to a table, how
> does
> > > that impact what occurs?
> > > For example, with CoGBK you have multiple streams writing to a table,
> how
> > > does that impact window merging?
> > >
> > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau
> <[email protected]
> > >
> > > wrote:
> > >
> > > > Hello Beam, Calcite, and Flink dev lists!
> > > >
> > > > Apologies for the big cross post, but I thought this might be
> something
> > > all
> > > > three communities would find relevant.
> > > >
> > > > Beam is finally making progress on a SQL DSL utilizing Calcite,
> thanks
> > to
> > > > Mingmin Xu. As you can imagine, we need to come to some conclusion
> > about
> > > > how to elegantly support the full suite of streaming functionality in
> > the
> > > > Beam model in via Calcite SQL. You folks in the Flink community have
> > been
> > > > pushing on this (e.g., adding windowing constructs, amongst others,
> > thank
> > > > you! :-), but from my understanding we still don't have a full spec
> for
> > > how
> > > > to support robust streaming in SQL (including but not limited to,
> > e.g., a
> > > > triggers analogue such as EMIT).
> > > >
> > > > I've been spending a lot of time thinking about this and have some
> > > opinions
> > > > about how I think it should look that I've already written down, so I
> > > > volunteered to try to drive forward agreement on a general streaming
> > SQL
> > > > spec between our three communities (well, technically I volunteered
> to
> > do
> > > > that w/ Beam and Calcite, but I figured you Flink folks might want to
> > > join
> > > > in since you're going that direction already anyway and will have
> > useful
> > > > insights :-).
> > > >
> > > > My plan was to do this by sharing two docs:
> > > >
> > > >    1. The Beam Model : Streams & Tables - This one is for context,
> and
> > > >    really only mentions SQL in passing. But it describes the
> > relationship
> > > >    between the Beam Model and the "streams & tables" way of thinking,
> > > which
> > > >    turns out to be useful in understanding what robust streaming in
> SQL
> > > > might
> > > >    look like. Many of you probably already know some or all of what's
> > in
> > > > here,
> > > >    but I felt it was necessary to have it all written down in order
> to
> > > > justify
> > > >    some of the proposals I wanted to make in the second doc.
> > > >
> > > >    2. A streaming SQL spec for Calcite - The goal for this doc is
> that
> > it
> > > >    would become a general specification for what robust streaming SQL
> > in
> > > >    Calcite should look like. It would start out as a basic proposal
> of
> > > what
> > > >    things *could* look like (combining both what things look like now
> > as
> > > > well
> > > >    as a set of proposed changes for the future), and we could all
> > iterate
> > > > on
> > > >    it together until we get to something we're happy with.
> > > >
> > > > At this point, I have doc #1 ready, and it's a bit of a monster, so I
> > > > figured I'd share it and let folks hack at it with comments if they
> > have
> > > > any, while I try to get the second doc ready in the meantime. As part
> > of
> > > > getting doc #2 ready, I'll be starting a separate thread to try to
> > gather
> > > > input on what things are already in flight for streaming SQL across
> the
> > > > various communities, to make sure the proposal captures everything
> > that's
> > > > going on as accurately as it can.
> > > >
> > > > If you have any questions or comments, I'm interested to hear them.
> > > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables":
> > > >
> > > >   http://s.apache.org/beam-streams-tables
> > > >
> > > > -Tyler
> > > >
> > >
> >
>

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to