Re: Towards a spec for robust streaming SQL, Part 1

Mingmin Xu Mon, 08 May 2017 22:13:23 -0700

Hi Jesse,

Glad to see lots of people interested in Beam SQL. Agree to improve the
document(create BEAM-2227 <https://issues.apache.org/jira/browse/BEAM-2227>),
so users can understand how SQL is executed with Beam. IMO Beam SQL targets
to fill the gap between standard SQL queries and Beam Pipeline, to run with
CLI or as DSL. I don't think Beam will have its own SQL engine.


Thanks!
Mingmin

On Mon, May 8, 2017 at 4:11 PM, Jesse Anderson <[email protected]>
wrote:

> -Other dev lists
>
> I'm just coming off speaking about Beam at GOTO Chicago and QCON Sao Paulo.
> There was a ton of interest in Beam with SQL as a cross-framework way of
> doing SQL.
>
> There's some confusion where people think we're just doing a pass through
> to the framework's SQL engine. We'll have to make sure we're clear on how
> Beam's SQL works in the docs.
>
> Thanks,
>
> Jesse
>
> On Mon, May 8, 2017 at 3:34 PM Tyler Akidau <[email protected]>
> wrote:
>
> > Any thoughts here Fabian? I'm planning to start sending out some more
> > emails towards the end of the week.
> >
> > -Tyler
> >
> >
> > On Wed, Apr 26, 2017 at 8:18 AM Tyler Akidau <[email protected]> wrote:
> >
> > > No worries, thanks for the heads up. Good luck wrapping all that stuff
> > up.
> > >
> > > -Tyler
> > >
> > > On Tue, Apr 25, 2017 at 12:07 AM Fabian Hueske <[email protected]>
> > wrote:
> > >
> > >> Hi Tyler,
> > >>
> > >> thanks for pushing this effort and including the Flink list.
> > >> I haven't managed to read the doc yet, but just wanted to thank you
> for
> > >> the
> > >> write-up and let you know that I'm very interested in this discussion.
> > >>
> > >> We are very close to the feature freeze of Flink 1.3 and I'm quite
> busy
> > >> getting as many contributions merged before the release is forked off.
> > >> When that happened, I'll have more time to read and comment.
> > >>
> > >> Thanks,
> > >> Fabian
> > >>
> > >>
> > >> 2017-04-22 0:16 GMT+02:00 Tyler Akidau <[email protected]>:
> > >>
> > >> > Good point, when you start talking about anything less than a full
> > join,
> > >> > triggers get involved to describe how one actually achieves the
> > desired
> > >> > semantics, and they may end up being tied to just one of the inputs
> > >> (e.g.,
> > >> > you may only care about the watermark for one side of the join). Am
> > >> > expecting us to address these sorts of details more precisely in doc
> > #2.
> > >> >
> > >> > -Tyler
> > >> >
> > >> > On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles
> > <[email protected]
> > >> >
> > >> > wrote:
> > >> >
> > >> > > There's something to be said about having different triggering
> > >> depending
> > >> > on
> > >> > > which side of a join data comes from, perhaps?
> > >> > >
> > >> > > (delightful doc, as usual)
> > >> > >
> > >> > > Kenn
> > >> > >
> > >> > > On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau
> > >> <[email protected]
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks for reading, Luke. The simple answer is that CoGBK is
> > >> basically
> > >> > > > flatten + GBK. Flatten is a non-grouping operation that merges
> the
> > >> > input
> > >> > > > streams into a single output stream. GBK then groups the data
> > within
> > >> > that
> > >> > > > single union stream as you might otherwise expect, yielding a
> > single
> > >> > > table.
> > >> > > > So I think it doesn't really impact things much. Grouping,
> > >> aggregation,
> > >> > > > window merging etc all just act upon the merged stream and
> > generate
> > >> > what
> > >> > > is
> > >> > > > effectively a merged table.
> > >> > > >
> > >> > > > -Tyler
> > >> > > >
> > >> > > > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik
> > >> <[email protected]
> > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > The doc is a good read.
> > >> > > > >
> > >> > > > > I think you do a great job of explaining table -> stream,
> stream
> > >> ->
> > >> > > > stream,
> > >> > > > > and stream -> table when there is only one stream.
> > >> > > > > But when there are multiple streams reading/writing to a
> table,
> > >> how
> > >> > > does
> > >> > > > > that impact what occurs?
> > >> > > > > For example, with CoGBK you have multiple streams writing to a
> > >> table,
> > >> > > how
> > >> > > > > does that impact window merging?
> > >> > > > >
> > >> > > > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau
> > >> > > <[email protected]
> > >> > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hello Beam, Calcite, and Flink dev lists!
> > >> > > > > >
> > >> > > > > > Apologies for the big cross post, but I thought this might
> be
> > >> > > something
> > >> > > > > all
> > >> > > > > > three communities would find relevant.
> > >> > > > > >
> > >> > > > > > Beam is finally making progress on a SQL DSL utilizing
> > Calcite,
> > >> > > thanks
> > >> > > > to
> > >> > > > > > Mingmin Xu. As you can imagine, we need to come to some
> > >> conclusion
> > >> > > > about
> > >> > > > > > how to elegantly support the full suite of streaming
> > >> functionality
> > >> > in
> > >> > > > the
> > >> > > > > > Beam model in via Calcite SQL. You folks in the Flink
> > community
> > >> > have
> > >> > > > been
> > >> > > > > > pushing on this (e.g., adding windowing constructs, amongst
> > >> others,
> > >> > > > thank
> > >> > > > > > you! :-), but from my understanding we still don't have a
> full
> > >> spec
> > >> > > for
> > >> > > > > how
> > >> > > > > > to support robust streaming in SQL (including but not
> limited
> > >> to,
> > >> > > > e.g., a
> > >> > > > > > triggers analogue such as EMIT).
> > >> > > > > >
> > >> > > > > > I've been spending a lot of time thinking about this and
> have
> > >> some
> > >> > > > > opinions
> > >> > > > > > about how I think it should look that I've already written
> > down,
> > >> > so I
> > >> > > > > > volunteered to try to drive forward agreement on a general
> > >> > streaming
> > >> > > > SQL
> > >> > > > > > spec between our three communities (well, technically I
> > >> volunteered
> > >> > > to
> > >> > > > do
> > >> > > > > > that w/ Beam and Calcite, but I figured you Flink folks
> might
> > >> want
> > >> > to
> > >> > > > > join
> > >> > > > > > in since you're going that direction already anyway and will
> > >> have
> > >> > > > useful
> > >> > > > > > insights :-).
> > >> > > > > >
> > >> > > > > > My plan was to do this by sharing two docs:
> > >> > > > > >
> > >> > > > > >    1. The Beam Model : Streams & Tables - This one is for
> > >> context,
> > >> > > and
> > >> > > > > >    really only mentions SQL in passing. But it describes the
> > >> > > > relationship
> > >> > > > > >    between the Beam Model and the "streams & tables" way of
> > >> > thinking,
> > >> > > > > which
> > >> > > > > >    turns out to be useful in understanding what robust
> > >> streaming in
> > >> > > SQL
> > >> > > > > > might
> > >> > > > > >    look like. Many of you probably already know some or all
> of
> > >> > what's
> > >> > > > in
> > >> > > > > > here,
> > >> > > > > >    but I felt it was necessary to have it all written down
> in
> > >> order
> > >> > > to
> > >> > > > > > justify
> > >> > > > > >    some of the proposals I wanted to make in the second doc.
> > >> > > > > >
> > >> > > > > >    2. A streaming SQL spec for Calcite - The goal for this
> doc
> > >> is
> > >> > > that
> > >> > > > it
> > >> > > > > >    would become a general specification for what robust
> > >> streaming
> > >> > SQL
> > >> > > > in
> > >> > > > > >    Calcite should look like. It would start out as a basic
> > >> proposal
> > >> > > of
> > >> > > > > what
> > >> > > > > >    things *could* look like (combining both what things look
> > >> like
> > >> > now
> > >> > > > as
> > >> > > > > > well
> > >> > > > > >    as a set of proposed changes for the future), and we
> could
> > >> all
> > >> > > > iterate
> > >> > > > > > on
> > >> > > > > >    it together until we get to something we're happy with.
> > >> > > > > >
> > >> > > > > > At this point, I have doc #1 ready, and it's a bit of a
> > monster,
> > >> > so I
> > >> > > > > > figured I'd share it and let folks hack at it with comments
> if
> > >> they
> > >> > > > have
> > >> > > > > > any, while I try to get the second doc ready in the
> meantime.
> > As
> > >> > part
> > >> > > > of
> > >> > > > > > getting doc #2 ready, I'll be starting a separate thread to
> > try
> > >> to
> > >> > > > gather
> > >> > > > > > input on what things are already in flight for streaming SQL
> > >> across
> > >> > > the
> > >> > > > > > various communities, to make sure the proposal captures
> > >> everything
> > >> > > > that's
> > >> > > > > > going on as accurately as it can.
> > >> > > > > >
> > >> > > > > > If you have any questions or comments, I'm interested to
> hear
> > >> them.
> > >> > > > > > Otherwise, here's doc #1, "The Beam Model : Streams &
> Tables":
> > >> > > > > >
> > >> > > > > >   http://s.apache.org/beam-streams-tables
> > >> > > > > >
> > >> > > > > > -Tyler
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
> --
> Thanks,
>
> Jesse
>



-- 
----
Mingmin

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to