-Other dev lists I'm just coming off speaking about Beam at GOTO Chicago and QCON Sao Paulo. There was a ton of interest in Beam with SQL as a cross-framework way of doing SQL.
There's some confusion where people think we're just doing a pass through to the framework's SQL engine. We'll have to make sure we're clear on how Beam's SQL works in the docs. Thanks, Jesse On Mon, May 8, 2017 at 3:34 PM Tyler Akidau <taki...@google.com.invalid> wrote: > Any thoughts here Fabian? I'm planning to start sending out some more > emails towards the end of the week. > > -Tyler > > > On Wed, Apr 26, 2017 at 8:18 AM Tyler Akidau <taki...@google.com> wrote: > > > No worries, thanks for the heads up. Good luck wrapping all that stuff > up. > > > > -Tyler > > > > On Tue, Apr 25, 2017 at 12:07 AM Fabian Hueske <fhue...@gmail.com> > wrote: > > > >> Hi Tyler, > >> > >> thanks for pushing this effort and including the Flink list. > >> I haven't managed to read the doc yet, but just wanted to thank you for > >> the > >> write-up and let you know that I'm very interested in this discussion. > >> > >> We are very close to the feature freeze of Flink 1.3 and I'm quite busy > >> getting as many contributions merged before the release is forked off. > >> When that happened, I'll have more time to read and comment. > >> > >> Thanks, > >> Fabian > >> > >> > >> 2017-04-22 0:16 GMT+02:00 Tyler Akidau <taki...@google.com.invalid>: > >> > >> > Good point, when you start talking about anything less than a full > join, > >> > triggers get involved to describe how one actually achieves the > desired > >> > semantics, and they may end up being tied to just one of the inputs > >> (e.g., > >> > you may only care about the watermark for one side of the join). Am > >> > expecting us to address these sorts of details more precisely in doc > #2. > >> > > >> > -Tyler > >> > > >> > On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles > <k...@google.com.invalid > >> > > >> > wrote: > >> > > >> > > There's something to be said about having different triggering > >> depending > >> > on > >> > > which side of a join data comes from, perhaps? > >> > > > >> > > (delightful doc, as usual) > >> > > > >> > > Kenn > >> > > > >> > > On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau > >> <taki...@google.com.invalid > >> > > > >> > > wrote: > >> > > > >> > > > Thanks for reading, Luke. The simple answer is that CoGBK is > >> basically > >> > > > flatten + GBK. Flatten is a non-grouping operation that merges the > >> > input > >> > > > streams into a single output stream. GBK then groups the data > within > >> > that > >> > > > single union stream as you might otherwise expect, yielding a > single > >> > > table. > >> > > > So I think it doesn't really impact things much. Grouping, > >> aggregation, > >> > > > window merging etc all just act upon the merged stream and > generate > >> > what > >> > > is > >> > > > effectively a merged table. > >> > > > > >> > > > -Tyler > >> > > > > >> > > > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik > >> <lc...@google.com.invalid > >> > > > >> > > > wrote: > >> > > > > >> > > > > The doc is a good read. > >> > > > > > >> > > > > I think you do a great job of explaining table -> stream, stream > >> -> > >> > > > stream, > >> > > > > and stream -> table when there is only one stream. > >> > > > > But when there are multiple streams reading/writing to a table, > >> how > >> > > does > >> > > > > that impact what occurs? > >> > > > > For example, with CoGBK you have multiple streams writing to a > >> table, > >> > > how > >> > > > > does that impact window merging? > >> > > > > > >> > > > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau > >> > > <taki...@google.com.invalid > >> > > > > > >> > > > > wrote: > >> > > > > > >> > > > > > Hello Beam, Calcite, and Flink dev lists! > >> > > > > > > >> > > > > > Apologies for the big cross post, but I thought this might be > >> > > something > >> > > > > all > >> > > > > > three communities would find relevant. > >> > > > > > > >> > > > > > Beam is finally making progress on a SQL DSL utilizing > Calcite, > >> > > thanks > >> > > > to > >> > > > > > Mingmin Xu. As you can imagine, we need to come to some > >> conclusion > >> > > > about > >> > > > > > how to elegantly support the full suite of streaming > >> functionality > >> > in > >> > > > the > >> > > > > > Beam model in via Calcite SQL. You folks in the Flink > community > >> > have > >> > > > been > >> > > > > > pushing on this (e.g., adding windowing constructs, amongst > >> others, > >> > > > thank > >> > > > > > you! :-), but from my understanding we still don't have a full > >> spec > >> > > for > >> > > > > how > >> > > > > > to support robust streaming in SQL (including but not limited > >> to, > >> > > > e.g., a > >> > > > > > triggers analogue such as EMIT). > >> > > > > > > >> > > > > > I've been spending a lot of time thinking about this and have > >> some > >> > > > > opinions > >> > > > > > about how I think it should look that I've already written > down, > >> > so I > >> > > > > > volunteered to try to drive forward agreement on a general > >> > streaming > >> > > > SQL > >> > > > > > spec between our three communities (well, technically I > >> volunteered > >> > > to > >> > > > do > >> > > > > > that w/ Beam and Calcite, but I figured you Flink folks might > >> want > >> > to > >> > > > > join > >> > > > > > in since you're going that direction already anyway and will > >> have > >> > > > useful > >> > > > > > insights :-). > >> > > > > > > >> > > > > > My plan was to do this by sharing two docs: > >> > > > > > > >> > > > > > 1. The Beam Model : Streams & Tables - This one is for > >> context, > >> > > and > >> > > > > > really only mentions SQL in passing. But it describes the > >> > > > relationship > >> > > > > > between the Beam Model and the "streams & tables" way of > >> > thinking, > >> > > > > which > >> > > > > > turns out to be useful in understanding what robust > >> streaming in > >> > > SQL > >> > > > > > might > >> > > > > > look like. Many of you probably already know some or all of > >> > what's > >> > > > in > >> > > > > > here, > >> > > > > > but I felt it was necessary to have it all written down in > >> order > >> > > to > >> > > > > > justify > >> > > > > > some of the proposals I wanted to make in the second doc. > >> > > > > > > >> > > > > > 2. A streaming SQL spec for Calcite - The goal for this doc > >> is > >> > > that > >> > > > it > >> > > > > > would become a general specification for what robust > >> streaming > >> > SQL > >> > > > in > >> > > > > > Calcite should look like. It would start out as a basic > >> proposal > >> > > of > >> > > > > what > >> > > > > > things *could* look like (combining both what things look > >> like > >> > now > >> > > > as > >> > > > > > well > >> > > > > > as a set of proposed changes for the future), and we could > >> all > >> > > > iterate > >> > > > > > on > >> > > > > > it together until we get to something we're happy with. > >> > > > > > > >> > > > > > At this point, I have doc #1 ready, and it's a bit of a > monster, > >> > so I > >> > > > > > figured I'd share it and let folks hack at it with comments if > >> they > >> > > > have > >> > > > > > any, while I try to get the second doc ready in the meantime. > As > >> > part > >> > > > of > >> > > > > > getting doc #2 ready, I'll be starting a separate thread to > try > >> to > >> > > > gather > >> > > > > > input on what things are already in flight for streaming SQL > >> across > >> > > the > >> > > > > > various communities, to make sure the proposal captures > >> everything > >> > > > that's > >> > > > > > going on as accurately as it can. > >> > > > > > > >> > > > > > If you have any questions or comments, I'm interested to hear > >> them. > >> > > > > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables": > >> > > > > > > >> > > > > > http://s.apache.org/beam-streams-tables > >> > > > > > > >> > > > > > -Tyler > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > -- Thanks, Jesse