Thank you both! On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <[email protected]> wrote:
> The i.p. clearance is complete: > https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org> > > Kenn > > On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <[email protected]> wrote: > >> Thanks Kenneth. >> >> I will start a vote for Beam ZetaSQL contribution. >> >> -Rui >> >> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <[email protected]> wrote: >> >>> Nice explanations of the reasoning. I think two things will stay >>> approximately the same even as the ecosystem develops: (1) ZetaSQL has >>> pretty clear semantics so we will have a compliant parser, whether it is >>> the official one or another like Calcite Babel, and (2) we will need a way >>> to implement all the standard ZetaSQL functions and this will be the same >>> no matter the frontend. >>> >>> For a contribution this large where i.p. clearance is necessary, a vote >>> is appropriate. It can happen at the same time or even after i.p. clearance. >>> >>> Kenn >>> >>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <[email protected]> wrote: >>> >>>> Thanks to highlight the parts of types/operators/functions/..., that >>>> does make things more complicated. +1 that as a short/middle term solution, >>>> the proposal is reasonable. We could follow up in future to handle it in >>>> Calcite Babel if possible. >>>> >>>> Mingmin >>>> >>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <[email protected]> wrote: >>>> >>>>> Hi Mingmin, >>>>> >>>>> Honestly I don't have an answer to it: a SQL dialect is complicated >>>>> and I don't have enough understanding on Calcite (Calcite has a big repo). >>>>> Based on my read from CALCITE-2280 >>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to >>>>> standard sql that a dialect is, the less blockers that we will have to >>>>> support this dialect in Calcite babel parser. >>>>> >>>>> However, this is a good question, which raises a good aspect that I >>>>> found people usually ignore: supporting a SQL dialect is not only support >>>>> a >>>>> type of syntax. It also includes data types, built-in sql functions, >>>>> operators and many other stuff. >>>>> >>>>> I especially found the following incompatibilities between Calcite and >>>>> ZetaSQL during the development: >>>>> 1. Calcite does not support Struct/Row type well because Calcite >>>>> flattens Rows when reading from tables by adding an extra Projection on >>>>> top >>>>> of tables. >>>>> 2. I had trouble in supporting DATETIME(or timestamp without >>>>> time zone) type. >>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is >>>>> different for AVG(long), and many many more. >>>>> 4. I am not sure if Calcite has the same set of type casting rules as >>>>> BigQuery(my impression is there are differences). >>>>> >>>>> >>>>> I would say in the short/mid term, it's much easier to use logical >>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has >>>>> similar practice, see their blog post >>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite> >>>>> ). >>>>> >>>>> For the longer term, it would be interesting to see how we can add >>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel >>>>> parser. >>>>> >>>>> >>>>> >>>>> -Rui >>>>> >>>>> >>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <[email protected]> wrote: >>>>> >>>>>> Just take a look at >>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which introduced >>>>>> Babel parser in Calcite to support varied dialects, this may be an easier >>>>>> way to support BigQuery syntax. @Rui do you notice any big difference >>>>>> between Calcite engine and ZetaSQL, like parsing, optimization? If that's >>>>>> the case, it make sense to build the alternative switch in Beam side. >>>>>> >>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <[email protected]> wrote: >>>>>> >>>>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL. >>>>>>> It's even more exciting to know if we could translate Spark >>>>>>> Structured Streaming code by a similar way, which enables existing Spark >>>>>>> SQL/Structure Streaming pipelines run on Beam. >>>>>>> >>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite >>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to >>>>>>> Calcite >>>>>>> itself is still a discussion. I am also looking for if anyone knows more >>>>>>> progress on this work than the thread. >>>>>>> >>>>>>> >>>>>>> [1]: >>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E >>>>>>> >>>>>>> -Rui >>>>>>> >>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <[email protected]> wrote: >>>>>>> >>>>>>>> I hear rumours that the Calcite project is planning on adding a >>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will >>>>>>>> be a >>>>>>>> Java parser we can use as well. Does anyone know if this work is still >>>>>>>> going on? >>>>>>>> >>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> A question to the community, does the size of the change require >>>>>>>>>> any process besides the usual PR reviews? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think so. This is a big change and has come as kind of a >>>>>>>>> surprise (sorry if I've missed previous discussions). >>>>>>>>> >>>>>>>>> Rui, could you explain more on how things will play out between >>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface >>>>>>>>> would >>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are >>>>>>>>> doing is >>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on >>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but >>>>>>>>> I could barely find any doc for end users. >>>>>>>>> >>>>>>>>> Also, I'd prefer the PR to be split into two, one for the >>>>>>>>> pluggable interface and one for the ZetaSQL. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Manu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thank you Rui for the heads up. >>>>>>>>>> >>>>>>>>>> A question to the community, does the size of the change require >>>>>>>>>> any process besides the usual PR reviews? >>>>>>>>>> >>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi community, >>>>>>>>>>> >>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in >>>>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is >>>>>>>>>>> ZetaSQL's documentation[2]. >>>>>>>>>>> >>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I >>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can >>>>>>>>>>> easily plug >>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually anyone >>>>>>>>>>> can add >>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect). >>>>>>>>>>> >>>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) >>>>>>>>>>> to Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). >>>>>>>>>>> This contribution barely touch existing Beam code (because the idea >>>>>>>>>>> is >>>>>>>>>>> plugable planner). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *Acknowledgement* >>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL >>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth >>>>>>>>>>> Knowles, >>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and >>>>>>>>>>> also >>>>>>>>>>> thanks to contributions which are not listed. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1]: https://github.com/google/zetasql >>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs >>>>>>>>>>> [3]: >>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Rui >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> -- >>>>>> ---- >>>>>> Mingmin >>>>>> >>>>> >>>> >>>> -- >>>> ---- >>>> Mingmin >>>> >>>
