Hi! True, the Flink community is looking into stream SQL, and is currently building on top of Calcite. This is all going well, but we probably need some custom syntax around windowing.
For Stream SQL Windowing, what I have seen so far in Calcite (correct me if I am wrong there), is pretty much a variant of the OLAP sliding window aggregates. - Windows are in those basically calculated by rounding down/up timestamps, thus bucketizing the events. That works for many cases, but is quite tricky syntax. - Flink supports various notions of time for windowing (processing time, ingestion time, event time), as well as triggers. To be able to extend the window specification with such additional parameters is pretty crucial and would probably go well with a dedicated window clause. - Flink also has unaligned windows (sessions, timeouts, ...) which are very hard to map to grouping and window aggregations across ordered groups. Converging to a core standard around stream SQL is very desirable, I completely agree. For the basic constructs, I think this is quite feasible and Calcite has some good suggestions there. In the advanced constructs, the systems differ quite heavily currently, so converging there may be harder there. Also, we are just learning what semantics people need concerning windowing/event time/etc. May almost be a tad bit too early to try and define a standard there... Greetings, Stephan On Thu, Feb 4, 2016 at 9:35 AM, Julian Hyde <[email protected]> wrote: > I totally agree with you. (Sorry for the delayed response; this week has > been very busy.) > > There is a tendency of vendors (and projects) to think that their > technology is unique, and superior to everyone else’s, and want to showcase > it in their dialect of SQL. That is natural, and it’s OK, since it makes > them strive to make their technology better. > > However, they have to remember that the end users don’t want something > unique, they want something that solves their problem. They would like > something that is standards compliant so that it is easy to learn, easy to > hire developers for, and — if the worst comes to the worst — easy to > migrate to a compatible competing technology. > > I know the developers at Storm and Flink (and Samza too) and they > understand the importance of collaborating on a standard. > > I have been trying to play a dual role: supplying the parser and planner > for streaming SQL, and also to facilitate the creation of a standard > language and semantics of streaming SQL. For the latter, see Streaming page > on Calcite’s web site[1]. On that page, I intend to illustrate all of the > main patterns of streaming queries, give them names (e.g. “Tumbling > windows”), and show how those translate into streaming SQL. > > Also, it would be useful to create a reference implementation of streaming > SQL in Calcite so that you can validate and run queries. The performance, > scalability and reliability will not be the same as if you ran Storm, Flink > or Samza, but at least you can see what the semantics should be. > > I believe that most, if not all, of the examples that the projects are > coming up with can be translated into SQL. It will be challenging, because > we want to preserve the semantics of SQL, allow streaming SQL to > interoperate with traditional relations, and also retain the general look > and feel of SQL. (For example, I fought quite hard[2] recently for the > principle that GROUP BY defines a partition (in the set-theory sense)[3] > and therefore could not be used to represent a tumbling window, until I > remembered that GROUPING SETS already allows each input row to appear in > more than one output sub-total.) > > What can you, the users, do? Get involved in the discussion about what you > want in the language. Encourage the projects to bring their proposed SQL > features into this forum for discussion, and add to the list of patterns > and examples on the Streaming page. As in any standards process, the users > help to keep the vendors focused. > > I’ll be talking about streaming SQL, planning, and standardization at the > Samza meetup in 2 weeks[4], so if any of you are in the Bay Area, please > stop by. > > Julian > > [1] http://calcite.apache.org/docs/stream.html > > [2] > http://mail-archives.apache.org/mod_mbox/calcite-dev/201506.mbox/%3CCAPSgeETbowxM2TRX0RFxQ_tEAPk2uM=he0arywinbtovgwb...@mail.gmail.com%3E > > [3] https://en.wikipedia.org/wiki/Partition_of_a_set > > [4] http://www.meetup.com/Bay-Area-Samza-Meetup/events/228430492/ > > > On Jan 29, 2016, at 10:29 PM, Wanglan (Lan) <[email protected]> > wrote: > > > > Hi to all, > > > > I am from Huawei and am focusing on data stream processing. > > Recently I noticed that both in Storm community and Flink community > there are endeavors to user Calcite as SQL parser to enable Storm/Flink to > support SQL. They both want to supplemented or clarify Streaming SQL of > calcite, especially the definition of windows. > > I am considering if both communities working on designing Stream SQL > syntax separately, there would come out two different syntaxes which > represent the same use case. > > Therefore, I am wondering if it is possible to unify such work, i.e. > design and compliment the calcite Streaming SQL to enrich window definition > so that both storm and flink can reuse the calcite(Streaming SQL) as their > SQL parser for streaming cases with little change. > > What do you think about this idea? > > > >
