Thanks Kenneth. I will start a vote for Beam ZetaSQL contribution.
-Rui On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <k...@apache.org> wrote: > Nice explanations of the reasoning. I think two things will stay > approximately the same even as the ecosystem develops: (1) ZetaSQL has > pretty clear semantics so we will have a compliant parser, whether it is > the official one or another like Calcite Babel, and (2) we will need a way > to implement all the standard ZetaSQL functions and this will be the same > no matter the frontend. > > For a contribution this large where i.p. clearance is necessary, a vote is > appropriate. It can happen at the same time or even after i.p. clearance. > > Kenn > > On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mingm...@gmail.com> wrote: > >> Thanks to highlight the parts of types/operators/functions/..., that does >> make things more complicated. +1 that as a short/middle term solution, the >> proposal is reasonable. We could follow up in future to handle it in >> Calcite Babel if possible. >> >> Mingmin >> >> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ruw...@google.com> wrote: >> >>> Hi Mingmin, >>> >>> Honestly I don't have an answer to it: a SQL dialect is complicated and >>> I don't have enough understanding on Calcite (Calcite has a big repo). >>> Based on my read from CALCITE-2280 >>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to >>> standard sql that a dialect is, the less blockers that we will have to >>> support this dialect in Calcite babel parser. >>> >>> However, this is a good question, which raises a good aspect that I >>> found people usually ignore: supporting a SQL dialect is not only support a >>> type of syntax. It also includes data types, built-in sql functions, >>> operators and many other stuff. >>> >>> I especially found the following incompatibilities between Calcite and >>> ZetaSQL during the development: >>> 1. Calcite does not support Struct/Row type well because Calcite >>> flattens Rows when reading from tables by adding an extra Projection on top >>> of tables. >>> 2. I had trouble in supporting DATETIME(or timestamp without time zone) >>> type. >>> 3. Huge incompatibilities on SQL functions. E.g. return type is >>> different for AVG(long), and many many more. >>> 4. I am not sure if Calcite has the same set of type casting rules as >>> BigQuery(my impression is there are differences). >>> >>> >>> I would say in the short/mid term, it's much easier to use logical plan >>> as IR to implement another SQL dialect for BeamSQL (Linkedin has >>> similar practice, see their blog post >>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite> >>> ). >>> >>> For the longer term, it would be interesting to see how we can add >>> BigQuery syntax (plus its data types and sql functions) to Calcite babel >>> parser. >>> >>> >>> >>> -Rui >>> >>> >>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mingm...@gmail.com> wrote: >>> >>>> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280 >>>> which introduced Babel parser in Calcite to support varied dialects, this >>>> may be an easier way to support BigQuery syntax. @Rui do you notice any big >>>> difference between Calcite engine and ZetaSQL, like parsing, optimization? >>>> If that's the case, it make sense to build the alternative switch in Beam >>>> side. >>>> >>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ruw...@google.com> wrote: >>>> >>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL. >>>>> It's even more exciting to know if we could translate Spark >>>>> Structured Streaming code by a similar way, which enables existing Spark >>>>> SQL/Structure Streaming pipelines run on Beam. >>>>> >>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and >>>>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite >>>>> itself is still a discussion. I am also looking for if anyone knows more >>>>> progress on this work than the thread. >>>>> >>>>> >>>>> [1]: >>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E >>>>> >>>>> -Rui >>>>> >>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote: >>>>> >>>>>> I hear rumours that the Calcite project is planning on adding a >>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will >>>>>> be a >>>>>> Java parser we can use as well. Does anyone know if this work is still >>>>>> going on? >>>>>> >>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> A question to the community, does the size of the change require any >>>>>>>> process besides the usual PR reviews? >>>>>>>> >>>>>>> >>>>>>> I think so. This is a big change and has come as kind of a surprise >>>>>>> (sorry if I've missed previous discussions). >>>>>>> >>>>>>> Rui, could you explain more on how things will play out between >>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface >>>>>>> would >>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are >>>>>>> doing is >>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on >>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I >>>>>>> could barely find any doc for end users. >>>>>>> >>>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable >>>>>>> interface and one for the ZetaSQL. >>>>>>> >>>>>>> Thanks, >>>>>>> Manu >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thank you Rui for the heads up. >>>>>>>> >>>>>>>> A question to the community, does the size of the change require >>>>>>>> any process besides the usual PR reviews? >>>>>>>> >>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ruw...@google.com> wrote: >>>>>>>> >>>>>>>>> Hi community, >>>>>>>>> >>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in >>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is >>>>>>>>> ZetaSQL's documentation[2]. >>>>>>>>> >>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made >>>>>>>>> a plugable query planner interface in BeamSQL, and we can easily plug >>>>>>>>> in a >>>>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add >>>>>>>>> new >>>>>>>>> planners by this way (e.g. PostgreSQL dialect). >>>>>>>>> >>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to >>>>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This >>>>>>>>> contribution barely touch existing Beam code (because the idea is >>>>>>>>> plugable >>>>>>>>> planner). >>>>>>>>> >>>>>>>>> >>>>>>>>> *Acknowledgement* >>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL >>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth >>>>>>>>> Knowles, >>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also >>>>>>>>> thanks to contributions which are not listed. >>>>>>>>> >>>>>>>>> >>>>>>>>> [1]: https://github.com/google/zetasql >>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs >>>>>>>>> [3]: >>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java >>>>>>>>> >>>>>>>>> >>>>>>>>> -Rui >>>>>>>>> >>>>>>>> >>>> >>>> -- >>>> ---- >>>> Mingmin >>>> >>> >> >> -- >> ---- >> Mingmin >> >