Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Alex Van Boxel Thu, 22 Aug 2019 07:39:49 -0700

This is a very informative thread. I would love that a lot of this
information and reasoning end up in the documentation.


 _/
_/ Alex Van Boxel


On Wed, Aug 21, 2019 at 9:17 PM Rui Wang <[email protected]> wrote:

> Thanks everyone! Now Beam ZetaSQL is merged into Beam repo!
>
>
> -Rui
>
> On Mon, Aug 19, 2019 at 8:36 AM Ahmet Altay <[email protected]> wrote:
>
>> Thank you both!
>>
>> On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <[email protected]> wrote:
>>
>>> The i.p. clearance is complete:
>>> https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org>
>>>
>>> Kenn
>>>
>>> On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <[email protected]> wrote:
>>>
>>>> Thanks Kenneth.
>>>>
>>>> I will start a vote for Beam ZetaSQL contribution.
>>>>
>>>> -Rui
>>>>
>>>> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <[email protected]>
>>>> wrote:
>>>>
>>>>> Nice explanations of the reasoning. I think two things will stay
>>>>> approximately the same even as the ecosystem develops: (1) ZetaSQL has
>>>>> pretty clear semantics so we will have a compliant parser, whether it is
>>>>> the official one or another like Calcite Babel, and (2) we will need a way
>>>>> to implement all the standard ZetaSQL functions and this will be the same
>>>>> no matter the frontend.
>>>>>
>>>>> For a contribution this large where i.p. clearance is necessary, a
>>>>> vote is appropriate. It can happen at the same time or even after i.p.
>>>>> clearance.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <[email protected]> wrote:
>>>>>
>>>>>> Thanks to highlight the parts of types/operators/functions/..., that
>>>>>> does make things more complicated. +1 that as a short/middle term 
>>>>>> solution,
>>>>>> the proposal is reasonable. We could follow up in future to handle it in
>>>>>> Calcite Babel if possible.
>>>>>>
>>>>>> Mingmin
>>>>>>
>>>>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <[email protected]> wrote:
>>>>>>
>>>>>>> Hi Mingmin,
>>>>>>>
>>>>>>> Honestly I don't have an answer to it: a SQL dialect is complicated
>>>>>>> and I don't have enough understanding on Calcite (Calcite has a big 
>>>>>>> repo).
>>>>>>> Based on my read from CALCITE-2280
>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>>>>>> standard sql that a dialect is, the less blockers that we will have to
>>>>>>> support this dialect in Calcite babel parser.
>>>>>>>
>>>>>>> However, this is a good question, which raises a good aspect that I
>>>>>>> found people usually ignore: supporting a SQL dialect is not only 
>>>>>>> support a
>>>>>>> type of syntax. It also includes data types, built-in sql functions,
>>>>>>> operators and many other stuff.
>>>>>>>
>>>>>>> I especially found the following incompatibilities between Calcite
>>>>>>> and ZetaSQL during the development:
>>>>>>> 1. Calcite does not support Struct/Row type well because Calcite
>>>>>>> flattens Rows when reading from tables by adding an extra Projection on 
>>>>>>> top
>>>>>>> of tables.
>>>>>>> 2. I had trouble in supporting DATETIME(or timestamp without
>>>>>>> time zone) type.
>>>>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>>>>>> different for AVG(long), and many many more.
>>>>>>> 4. I am not sure if Calcite has the same set of type casting rules
>>>>>>> as BigQuery(my impression is there are differences).
>>>>>>>
>>>>>>>
>>>>>>> I would say in the short/mid term, it's much easier to use logical
>>>>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>>>>>> similar practice, see their blog post
>>>>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>>>>>> ).
>>>>>>>
>>>>>>> For the longer term, it would be interesting to see how we can add
>>>>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>>>>>> parser.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just take a look at
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which
>>>>>>>> introduced Babel parser in Calcite to support varied dialects, this 
>>>>>>>> may be
>>>>>>>> an easier way to support BigQuery syntax. @Rui do you notice any big
>>>>>>>> difference between Calcite engine and ZetaSQL, like parsing, 
>>>>>>>> optimization?
>>>>>>>> If that's the case, it make sense to build the alternative switch in 
>>>>>>>> Beam
>>>>>>>> side.
>>>>>>>>
>>>>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Mingmin - it sounds like an awesome idea to translate from
>>>>>>>>> SparkSQL. It's even more exciting to know if we could translate Spark
>>>>>>>>> Structured Streaming code by a similar way, which enables existing 
>>>>>>>>> Spark
>>>>>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>>>>>
>>>>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite
>>>>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to 
>>>>>>>>> Calcite
>>>>>>>>> itself is still a discussion. I am also looking for if anyone knows 
>>>>>>>>> more
>>>>>>>>> progress on this work than the thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there 
>>>>>>>>>> will be a
>>>>>>>>>> Java parser we can use as well. Does anyone know if this work is 
>>>>>>>>>> still
>>>>>>>>>> going on?
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think so. This is a big change and has come as kind of a
>>>>>>>>>>> surprise (sorry if I've missed previous discussions).
>>>>>>>>>>>
>>>>>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface 
>>>>>>>>>>> would
>>>>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are 
>>>>>>>>>>> doing is
>>>>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting
>>>>>>>>>>> but I could barely find any doc for end users.
>>>>>>>>>>>
>>>>>>>>>>> Also, I'd prefer the PR to be split into two, one for the
>>>>>>>>>>> pluggable interface and one for the ZetaSQL.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Manu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>>>>>
>>>>>>>>>>>> A question to the community, does the size of the change
>>>>>>>>>>>> require any process besides the usual PR reviews?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi community,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect
>>>>>>>>>>>>> in BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. 
>>>>>>>>>>>>> Here is
>>>>>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I
>>>>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can 
>>>>>>>>>>>>> easily plug
>>>>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually 
>>>>>>>>>>>>> anyone can add
>>>>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to contribute ZetaSQL planner and its related
>>>>>>>>>>>>> code(~10k) to Beam repo(#9210
>>>>>>>>>>>>> <https://github.com/apache/beam/pull/9210>). This
>>>>>>>>>>>>> contribution barely touch existing Beam code (because the idea is 
>>>>>>>>>>>>> plugable
>>>>>>>>>>>>> planner).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Acknowledgement*
>>>>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, 
>>>>>>>>>>>>> Kenneth Knowles,
>>>>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and 
>>>>>>>>>>>>> also
>>>>>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>>>>>> [3]:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Rui
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ----
>>>>>>>> Mingmin
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----
>>>>>> Mingmin
>>>>>>
>>>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Reply via email to