Hello Beam, Calcite, and Flink dev lists!

Apologies for the big cross post, but I thought this might be something all
three communities would find relevant.

Beam is finally making progress on a SQL DSL utilizing Calcite, thanks to
Mingmin Xu. As you can imagine, we need to come to some conclusion about
how to elegantly support the full suite of streaming functionality in the
Beam model in via Calcite SQL. You folks in the Flink community have been
pushing on this (e.g., adding windowing constructs, amongst others, thank
you! :-), but from my understanding we still don't have a full spec for how
to support robust streaming in SQL (including but not limited to, e.g., a
triggers analogue such as EMIT).

I've been spending a lot of time thinking about this and have some opinions
about how I think it should look that I've already written down, so I
volunteered to try to drive forward agreement on a general streaming SQL
spec between our three communities (well, technically I volunteered to do
that w/ Beam and Calcite, but I figured you Flink folks might want to join
in since you're going that direction already anyway and will have useful
insights :-).

My plan was to do this by sharing two docs:

   1. The Beam Model : Streams & Tables - This one is for context, and
   really only mentions SQL in passing. But it describes the relationship
   between the Beam Model and the "streams & tables" way of thinking, which
   turns out to be useful in understanding what robust streaming in SQL might
   look like. Many of you probably already know some or all of what's in here,
   but I felt it was necessary to have it all written down in order to justify
   some of the proposals I wanted to make in the second doc.

   2. A streaming SQL spec for Calcite - The goal for this doc is that it
   would become a general specification for what robust streaming SQL in
   Calcite should look like. It would start out as a basic proposal of what
   things *could* look like (combining both what things look like now as well
   as a set of proposed changes for the future), and we could all iterate on
   it together until we get to something we're happy with.

At this point, I have doc #1 ready, and it's a bit of a monster, so I
figured I'd share it and let folks hack at it with comments if they have
any, while I try to get the second doc ready in the meantime. As part of
getting doc #2 ready, I'll be starting a separate thread to try to gather
input on what things are already in flight for streaming SQL across the
various communities, to make sure the proposal captures everything that's
going on as accurately as it can.

If you have any questions or comments, I'm interested to hear them.
Otherwise, here's doc #1, "The Beam Model : Streams & Tables":

  http://s.apache.org/beam-streams-tables

-Tyler

Reply via email to