Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Rui Wang Mon, 12 Aug 2019 16:25:31 -0700

Thanks Kenneth.

I will start a vote for Beam ZetaSQL contribution.


-Rui

On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <[email protected]> wrote:

> Nice explanations of the reasoning. I think two things will stay
> approximately the same even as the ecosystem develops: (1) ZetaSQL has
> pretty clear semantics so we will have a compliant parser, whether it is
> the official one or another like Calcite Babel, and (2) we will need a way
> to implement all the standard ZetaSQL functions and this will be the same
> no matter the frontend.
>
> For a contribution this large where i.p. clearance is necessary, a vote is
> appropriate. It can happen at the same time or even after i.p. clearance.
>
> Kenn
>
> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <[email protected]> wrote:
>
>> Thanks to highlight the parts of types/operators/functions/..., that does
>> make things more complicated. +1 that as a short/middle term solution, the
>> proposal is reasonable. We could follow up in future to handle it in
>> Calcite Babel if possible.
>>
>> Mingmin
>>
>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <[email protected]> wrote:
>>
>>> Hi Mingmin,
>>>
>>> Honestly I don't have an answer to it: a SQL dialect is complicated and
>>> I don't have enough understanding on Calcite (Calcite has a big repo).
>>> Based on my read from CALCITE-2280
>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>> standard sql that a dialect is, the less blockers that we will have to
>>> support this dialect in Calcite babel parser.
>>>
>>> However, this is a good question, which raises a good aspect that I
>>> found people usually ignore: supporting a SQL dialect is not only support a
>>> type of syntax. It also includes data types, built-in sql functions,
>>> operators and many other stuff.
>>>
>>> I especially found the following incompatibilities between Calcite and
>>> ZetaSQL during the development:
>>> 1. Calcite does not support Struct/Row type well because Calcite
>>> flattens Rows when reading from tables by adding an extra Projection on top
>>> of tables.
>>> 2. I had trouble in supporting DATETIME(or timestamp without time zone)
>>> type.
>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>> different for AVG(long), and many many more.
>>> 4. I am not sure if Calcite has the same set of type casting rules as
>>> BigQuery(my impression is there are differences).
>>>
>>>
>>> I would say in the short/mid term, it's much easier to use logical plan
>>> as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>> similar practice, see their blog post
>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>> ).
>>>
>>> For the longer term, it would be interesting to see how we can add
>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>> parser.
>>>
>>>
>>>
>>> -Rui
>>>
>>>
>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <[email protected]> wrote:
>>>
>>>> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
>>>> which introduced Babel parser in Calcite to support varied dialects, this
>>>> may be an easier way to support BigQuery syntax. @Rui do you notice any big
>>>> difference between Calcite engine and ZetaSQL, like parsing, optimization?
>>>> If that's the case, it make sense to build the alternative switch in Beam
>>>> side.
>>>>
>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <[email protected]> wrote:
>>>>
>>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL.
>>>>> It's even more exciting to know if we could translate Spark
>>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>
>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and
>>>>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>>> progress on this work than the thread.
>>>>>
>>>>>
>>>>> [1]:
>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <[email protected]> wrote:
>>>>>
>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will 
>>>>>> be a
>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>> going on?
>>>>>>
>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> A question to the community, does the size of the change require any
>>>>>>>> process besides the usual PR reviews?
>>>>>>>>
>>>>>>>
>>>>>>> I think so. This is a big change and has come as kind of a surprise
>>>>>>> (sorry if I've missed previous discussions).
>>>>>>>
>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface 
>>>>>>> would
>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are 
>>>>>>> doing is
>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>>>>>> could barely find any doc for end users.
>>>>>>>
>>>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>>>>> interface and one for the ZetaSQL.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>
>>>>>>>> A question to the community, does the size of the change require
>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>
>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi community,
>>>>>>>>>
>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>
>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made
>>>>>>>>> a plugable query planner interface in BeamSQL, and we can easily plug 
>>>>>>>>> in a
>>>>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add 
>>>>>>>>> new
>>>>>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>
>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>>>>>> contribution barely touch existing Beam code (because the idea is 
>>>>>>>>> plugable
>>>>>>>>> planner).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Acknowledgement*
>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth 
>>>>>>>>> Knowles,
>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>> [3]:
>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> ----
>>>> Mingmin
>>>>
>>>
>>
>> --
>> ----
>> Mingmin
>>
>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Reply via email to