Re: Towards a spec for robust streaming SQL, Part 1

Jesse Anderson Mon, 08 May 2017 16:12:06 -0700

-Other dev lists

I'm just coming off speaking about Beam at GOTO Chicago and QCON Sao Paulo.
There was a ton of interest in Beam with SQL as a cross-framework way of
doing SQL.


There's some confusion where people think we're just doing a pass through
to the framework's SQL engine. We'll have to make sure we're clear on how
Beam's SQL works in the docs.

Thanks,

Jesse

On Mon, May 8, 2017 at 3:34 PM Tyler Akidau <taki...@google.com.invalid>
wrote:

> Any thoughts here Fabian? I'm planning to start sending out some more
> emails towards the end of the week.
>
> -Tyler
>
>
> On Wed, Apr 26, 2017 at 8:18 AM Tyler Akidau <taki...@google.com> wrote:
>
> > No worries, thanks for the heads up. Good luck wrapping all that stuff
> up.
> >
> > -Tyler
> >
> > On Tue, Apr 25, 2017 at 12:07 AM Fabian Hueske <fhue...@gmail.com>
> wrote:
> >
> >> Hi Tyler,
> >>
> >> thanks for pushing this effort and including the Flink list.
> >> I haven't managed to read the doc yet, but just wanted to thank you for
> >> the
> >> write-up and let you know that I'm very interested in this discussion.
> >>
> >> We are very close to the feature freeze of Flink 1.3 and I'm quite busy
> >> getting as many contributions merged before the release is forked off.
> >> When that happened, I'll have more time to read and comment.
> >>
> >> Thanks,
> >> Fabian
> >>
> >>
> >> 2017-04-22 0:16 GMT+02:00 Tyler Akidau <taki...@google.com.invalid>:
> >>
> >> > Good point, when you start talking about anything less than a full
> join,
> >> > triggers get involved to describe how one actually achieves the
> desired
> >> > semantics, and they may end up being tied to just one of the inputs
> >> (e.g.,
> >> > you may only care about the watermark for one side of the join). Am
> >> > expecting us to address these sorts of details more precisely in doc
> #2.
> >> >
> >> > -Tyler
> >> >
> >> > On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles
> <k...@google.com.invalid
> >> >
> >> > wrote:
> >> >
> >> > > There's something to be said about having different triggering
> >> depending
> >> > on
> >> > > which side of a join data comes from, perhaps?
> >> > >
> >> > > (delightful doc, as usual)
> >> > >
> >> > > Kenn
> >> > >
> >> > > On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau
> >> <taki...@google.com.invalid
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Thanks for reading, Luke. The simple answer is that CoGBK is
> >> basically
> >> > > > flatten + GBK. Flatten is a non-grouping operation that merges the
> >> > input
> >> > > > streams into a single output stream. GBK then groups the data
> within
> >> > that
> >> > > > single union stream as you might otherwise expect, yielding a
> single
> >> > > table.
> >> > > > So I think it doesn't really impact things much. Grouping,
> >> aggregation,
> >> > > > window merging etc all just act upon the merged stream and
> generate
> >> > what
> >> > > is
> >> > > > effectively a merged table.
> >> > > >
> >> > > > -Tyler
> >> > > >
> >> > > > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik
> >> <lc...@google.com.invalid
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > > > The doc is a good read.
> >> > > > >
> >> > > > > I think you do a great job of explaining table -> stream, stream
> >> ->
> >> > > > stream,
> >> > > > > and stream -> table when there is only one stream.
> >> > > > > But when there are multiple streams reading/writing to a table,
> >> how
> >> > > does
> >> > > > > that impact what occurs?
> >> > > > > For example, with CoGBK you have multiple streams writing to a
> >> table,
> >> > > how
> >> > > > > does that impact window merging?
> >> > > > >
> >> > > > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau
> >> > > <taki...@google.com.invalid
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hello Beam, Calcite, and Flink dev lists!
> >> > > > > >
> >> > > > > > Apologies for the big cross post, but I thought this might be
> >> > > something
> >> > > > > all
> >> > > > > > three communities would find relevant.
> >> > > > > >
> >> > > > > > Beam is finally making progress on a SQL DSL utilizing
> Calcite,
> >> > > thanks
> >> > > > to
> >> > > > > > Mingmin Xu. As you can imagine, we need to come to some
> >> conclusion
> >> > > > about
> >> > > > > > how to elegantly support the full suite of streaming
> >> functionality
> >> > in
> >> > > > the
> >> > > > > > Beam model in via Calcite SQL. You folks in the Flink
> community
> >> > have
> >> > > > been
> >> > > > > > pushing on this (e.g., adding windowing constructs, amongst
> >> others,
> >> > > > thank
> >> > > > > > you! :-), but from my understanding we still don't have a full
> >> spec
> >> > > for
> >> > > > > how
> >> > > > > > to support robust streaming in SQL (including but not limited
> >> to,
> >> > > > e.g., a
> >> > > > > > triggers analogue such as EMIT).
> >> > > > > >
> >> > > > > > I've been spending a lot of time thinking about this and have
> >> some
> >> > > > > opinions
> >> > > > > > about how I think it should look that I've already written
> down,
> >> > so I
> >> > > > > > volunteered to try to drive forward agreement on a general
> >> > streaming
> >> > > > SQL
> >> > > > > > spec between our three communities (well, technically I
> >> volunteered
> >> > > to
> >> > > > do
> >> > > > > > that w/ Beam and Calcite, but I figured you Flink folks might
> >> want
> >> > to
> >> > > > > join
> >> > > > > > in since you're going that direction already anyway and will
> >> have
> >> > > > useful
> >> > > > > > insights :-).
> >> > > > > >
> >> > > > > > My plan was to do this by sharing two docs:
> >> > > > > >
> >> > > > > >    1. The Beam Model : Streams & Tables - This one is for
> >> context,
> >> > > and
> >> > > > > >    really only mentions SQL in passing. But it describes the
> >> > > > relationship
> >> > > > > >    between the Beam Model and the "streams & tables" way of
> >> > thinking,
> >> > > > > which
> >> > > > > >    turns out to be useful in understanding what robust
> >> streaming in
> >> > > SQL
> >> > > > > > might
> >> > > > > >    look like. Many of you probably already know some or all of
> >> > what's
> >> > > > in
> >> > > > > > here,
> >> > > > > >    but I felt it was necessary to have it all written down in
> >> order
> >> > > to
> >> > > > > > justify
> >> > > > > >    some of the proposals I wanted to make in the second doc.
> >> > > > > >
> >> > > > > >    2. A streaming SQL spec for Calcite - The goal for this doc
> >> is
> >> > > that
> >> > > > it
> >> > > > > >    would become a general specification for what robust
> >> streaming
> >> > SQL
> >> > > > in
> >> > > > > >    Calcite should look like. It would start out as a basic
> >> proposal
> >> > > of
> >> > > > > what
> >> > > > > >    things *could* look like (combining both what things look
> >> like
> >> > now
> >> > > > as
> >> > > > > > well
> >> > > > > >    as a set of proposed changes for the future), and we could
> >> all
> >> > > > iterate
> >> > > > > > on
> >> > > > > >    it together until we get to something we're happy with.
> >> > > > > >
> >> > > > > > At this point, I have doc #1 ready, and it's a bit of a
> monster,
> >> > so I
> >> > > > > > figured I'd share it and let folks hack at it with comments if
> >> they
> >> > > > have
> >> > > > > > any, while I try to get the second doc ready in the meantime.
> As
> >> > part
> >> > > > of
> >> > > > > > getting doc #2 ready, I'll be starting a separate thread to
> try
> >> to
> >> > > > gather
> >> > > > > > input on what things are already in flight for streaming SQL
> >> across
> >> > > the
> >> > > > > > various communities, to make sure the proposal captures
> >> everything
> >> > > > that's
> >> > > > > > going on as accurately as it can.
> >> > > > > >
> >> > > > > > If you have any questions or comments, I'm interested to hear
> >> them.
> >> > > > > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables":
> >> > > > > >
> >> > > > > >   http://s.apache.org/beam-streams-tables
> >> > > > > >
> >> > > > > > -Tyler
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>
-- 
Thanks,

Jesse

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to