This is a very informative thread. I would love that a lot of this information and reasoning end up in the documentation.
_/ _/ Alex Van Boxel On Wed, Aug 21, 2019 at 9:17 PM Rui Wang <ruw...@google.com> wrote: > Thanks everyone! Now Beam ZetaSQL is merged into Beam repo! > > > -Rui > > On Mon, Aug 19, 2019 at 8:36 AM Ahmet Altay <al...@google.com> wrote: > >> Thank you both! >> >> On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <k...@apache.org> wrote: >> >>> The i.p. clearance is complete: >>> https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org> >>> >>> Kenn >>> >>> On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <ruw...@google.com> wrote: >>> >>>> Thanks Kenneth. >>>> >>>> I will start a vote for Beam ZetaSQL contribution. >>>> >>>> -Rui >>>> >>>> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <k...@apache.org> >>>> wrote: >>>> >>>>> Nice explanations of the reasoning. I think two things will stay >>>>> approximately the same even as the ecosystem develops: (1) ZetaSQL has >>>>> pretty clear semantics so we will have a compliant parser, whether it is >>>>> the official one or another like Calcite Babel, and (2) we will need a way >>>>> to implement all the standard ZetaSQL functions and this will be the same >>>>> no matter the frontend. >>>>> >>>>> For a contribution this large where i.p. clearance is necessary, a >>>>> vote is appropriate. It can happen at the same time or even after i.p. >>>>> clearance. >>>>> >>>>> Kenn >>>>> >>>>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mingm...@gmail.com> wrote: >>>>> >>>>>> Thanks to highlight the parts of types/operators/functions/..., that >>>>>> does make things more complicated. +1 that as a short/middle term >>>>>> solution, >>>>>> the proposal is reasonable. We could follow up in future to handle it in >>>>>> Calcite Babel if possible. >>>>>> >>>>>> Mingmin >>>>>> >>>>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ruw...@google.com> wrote: >>>>>> >>>>>>> Hi Mingmin, >>>>>>> >>>>>>> Honestly I don't have an answer to it: a SQL dialect is complicated >>>>>>> and I don't have enough understanding on Calcite (Calcite has a big >>>>>>> repo). >>>>>>> Based on my read from CALCITE-2280 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to >>>>>>> standard sql that a dialect is, the less blockers that we will have to >>>>>>> support this dialect in Calcite babel parser. >>>>>>> >>>>>>> However, this is a good question, which raises a good aspect that I >>>>>>> found people usually ignore: supporting a SQL dialect is not only >>>>>>> support a >>>>>>> type of syntax. It also includes data types, built-in sql functions, >>>>>>> operators and many other stuff. >>>>>>> >>>>>>> I especially found the following incompatibilities between Calcite >>>>>>> and ZetaSQL during the development: >>>>>>> 1. Calcite does not support Struct/Row type well because Calcite >>>>>>> flattens Rows when reading from tables by adding an extra Projection on >>>>>>> top >>>>>>> of tables. >>>>>>> 2. I had trouble in supporting DATETIME(or timestamp without >>>>>>> time zone) type. >>>>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is >>>>>>> different for AVG(long), and many many more. >>>>>>> 4. I am not sure if Calcite has the same set of type casting rules >>>>>>> as BigQuery(my impression is there are differences). >>>>>>> >>>>>>> >>>>>>> I would say in the short/mid term, it's much easier to use logical >>>>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has >>>>>>> similar practice, see their blog post >>>>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite> >>>>>>> ). >>>>>>> >>>>>>> For the longer term, it would be interesting to see how we can add >>>>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel >>>>>>> parser. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -Rui >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mingm...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Just take a look at >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which >>>>>>>> introduced Babel parser in Calcite to support varied dialects, this >>>>>>>> may be >>>>>>>> an easier way to support BigQuery syntax. @Rui do you notice any big >>>>>>>> difference between Calcite engine and ZetaSQL, like parsing, >>>>>>>> optimization? >>>>>>>> If that's the case, it make sense to build the alternative switch in >>>>>>>> Beam >>>>>>>> side. >>>>>>>> >>>>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ruw...@google.com> wrote: >>>>>>>> >>>>>>>>> Mingmin - it sounds like an awesome idea to translate from >>>>>>>>> SparkSQL. It's even more exciting to know if we could translate Spark >>>>>>>>> Structured Streaming code by a similar way, which enables existing >>>>>>>>> Spark >>>>>>>>> SQL/Structure Streaming pipelines run on Beam. >>>>>>>>> >>>>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite >>>>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to >>>>>>>>> Calcite >>>>>>>>> itself is still a discussion. I am also looking for if anyone knows >>>>>>>>> more >>>>>>>>> progress on this work than the thread. >>>>>>>>> >>>>>>>>> >>>>>>>>> [1]: >>>>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E >>>>>>>>> >>>>>>>>> -Rui >>>>>>>>> >>>>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I hear rumours that the Calcite project is planning on adding a >>>>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there >>>>>>>>>> will be a >>>>>>>>>> Java parser we can use as well. Does anyone know if this work is >>>>>>>>>> still >>>>>>>>>> going on? >>>>>>>>>> >>>>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang < >>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> A question to the community, does the size of the change require >>>>>>>>>>>> any process besides the usual PR reviews? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think so. This is a big change and has come as kind of a >>>>>>>>>>> surprise (sorry if I've missed previous discussions). >>>>>>>>>>> >>>>>>>>>>> Rui, could you explain more on how things will play out between >>>>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface >>>>>>>>>>> would >>>>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are >>>>>>>>>>> doing is >>>>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on >>>>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting >>>>>>>>>>> but I could barely find any doc for end users. >>>>>>>>>>> >>>>>>>>>>> Also, I'd prefer the PR to be split into two, one for the >>>>>>>>>>> pluggable interface and one for the ZetaSQL. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Manu >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thank you Rui for the heads up. >>>>>>>>>>>> >>>>>>>>>>>> A question to the community, does the size of the change >>>>>>>>>>>> require any process besides the usual PR reviews? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ruw...@google.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi community, >>>>>>>>>>>>> >>>>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect >>>>>>>>>>>>> in BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. >>>>>>>>>>>>> Here is >>>>>>>>>>>>> ZetaSQL's documentation[2]. >>>>>>>>>>>>> >>>>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I >>>>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can >>>>>>>>>>>>> easily plug >>>>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually >>>>>>>>>>>>> anyone can add >>>>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect). >>>>>>>>>>>>> >>>>>>>>>>>>> I want to contribute ZetaSQL planner and its related >>>>>>>>>>>>> code(~10k) to Beam repo(#9210 >>>>>>>>>>>>> <https://github.com/apache/beam/pull/9210>). This >>>>>>>>>>>>> contribution barely touch existing Beam code (because the idea is >>>>>>>>>>>>> plugable >>>>>>>>>>>>> planner). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *Acknowledgement* >>>>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL >>>>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, >>>>>>>>>>>>> Kenneth Knowles, >>>>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and >>>>>>>>>>>>> also >>>>>>>>>>>>> thanks to contributions which are not listed. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1]: https://github.com/google/zetasql >>>>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs >>>>>>>>>>>>> [3]: >>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Rui >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ---- >>>>>>>> Mingmin >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> ---- >>>>>> Mingmin >>>>>> >>>>>