Thanks everyone! Now Beam ZetaSQL is merged into Beam repo!

-Rui

On Mon, Aug 19, 2019 at 8:36 AM Ahmet Altay <al...@google.com> wrote:

> Thank you both!
>
> On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <k...@apache.org> wrote:
>
>> The i.p. clearance is complete:
>> https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org>
>>
>> Kenn
>>
>> On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <ruw...@google.com> wrote:
>>
>>> Thanks Kenneth.
>>>
>>> I will start a vote for Beam ZetaSQL contribution.
>>>
>>> -Rui
>>>
>>> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> Nice explanations of the reasoning. I think two things will stay
>>>> approximately the same even as the ecosystem develops: (1) ZetaSQL has
>>>> pretty clear semantics so we will have a compliant parser, whether it is
>>>> the official one or another like Calcite Babel, and (2) we will need a way
>>>> to implement all the standard ZetaSQL functions and this will be the same
>>>> no matter the frontend.
>>>>
>>>> For a contribution this large where i.p. clearance is necessary, a vote
>>>> is appropriate. It can happen at the same time or even after i.p. 
>>>> clearance.
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mingm...@gmail.com> wrote:
>>>>
>>>>> Thanks to highlight the parts of types/operators/functions/..., that
>>>>> does make things more complicated. +1 that as a short/middle term 
>>>>> solution,
>>>>> the proposal is reasonable. We could follow up in future to handle it in
>>>>> Calcite Babel if possible.
>>>>>
>>>>> Mingmin
>>>>>
>>>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ruw...@google.com> wrote:
>>>>>
>>>>>> Hi Mingmin,
>>>>>>
>>>>>> Honestly I don't have an answer to it: a SQL dialect is complicated
>>>>>> and I don't have enough understanding on Calcite (Calcite has a big 
>>>>>> repo).
>>>>>> Based on my read from CALCITE-2280
>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>>>>> standard sql that a dialect is, the less blockers that we will have to
>>>>>> support this dialect in Calcite babel parser.
>>>>>>
>>>>>> However, this is a good question, which raises a good aspect that I
>>>>>> found people usually ignore: supporting a SQL dialect is not only 
>>>>>> support a
>>>>>> type of syntax. It also includes data types, built-in sql functions,
>>>>>> operators and many other stuff.
>>>>>>
>>>>>> I especially found the following incompatibilities between Calcite
>>>>>> and ZetaSQL during the development:
>>>>>> 1. Calcite does not support Struct/Row type well because Calcite
>>>>>> flattens Rows when reading from tables by adding an extra Projection on 
>>>>>> top
>>>>>> of tables.
>>>>>> 2. I had trouble in supporting DATETIME(or timestamp without
>>>>>> time zone) type.
>>>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>>>>> different for AVG(long), and many many more.
>>>>>> 4. I am not sure if Calcite has the same set of type casting rules as
>>>>>> BigQuery(my impression is there are differences).
>>>>>>
>>>>>>
>>>>>> I would say in the short/mid term, it's much easier to use logical
>>>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>>>>> similar practice, see their blog post
>>>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>>>>> ).
>>>>>>
>>>>>> For the longer term, it would be interesting to see how we can add
>>>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>>>>> parser.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mingm...@gmail.com> wrote:
>>>>>>
>>>>>>> Just take a look at
>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which introduced
>>>>>>> Babel parser in Calcite to support varied dialects, this may be an 
>>>>>>> easier
>>>>>>> way to support BigQuery syntax. @Rui do you notice any big difference
>>>>>>> between Calcite engine and ZetaSQL, like parsing, optimization? If 
>>>>>>> that's
>>>>>>> the case, it make sense to build the alternative switch in Beam side.
>>>>>>>
>>>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>>
>>>>>>>> Mingmin - it sounds like an awesome idea to translate from
>>>>>>>> SparkSQL. It's even more exciting to know if we could translate Spark
>>>>>>>> Structured Streaming code by a similar way, which enables existing 
>>>>>>>> Spark
>>>>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>>>>
>>>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite
>>>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to 
>>>>>>>> Calcite
>>>>>>>> itself is still a discussion. I am also looking for if anyone knows 
>>>>>>>> more
>>>>>>>> progress on this work than the thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>>
>>>>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there 
>>>>>>>>> will be a
>>>>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>>>>> going on?
>>>>>>>>>
>>>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think so. This is a big change and has come as kind of a
>>>>>>>>>> surprise (sorry if I've missed previous discussions).
>>>>>>>>>>
>>>>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface 
>>>>>>>>>> would
>>>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are 
>>>>>>>>>> doing is
>>>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting
>>>>>>>>>> but I could barely find any doc for end users.
>>>>>>>>>>
>>>>>>>>>> Also, I'd prefer the PR to be split into two, one for the
>>>>>>>>>> pluggable interface and one for the ZetaSQL.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Manu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>>>>
>>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ruw...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi community,
>>>>>>>>>>>>
>>>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect
>>>>>>>>>>>> in BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here 
>>>>>>>>>>>> is
>>>>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>>>>
>>>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I
>>>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can 
>>>>>>>>>>>> easily plug
>>>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually anyone 
>>>>>>>>>>>> can add
>>>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>>>>
>>>>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k)
>>>>>>>>>>>> to Beam repo(#9210 <https://github.com/apache/beam/pull/9210>).
>>>>>>>>>>>> This contribution barely touch existing Beam code (because the 
>>>>>>>>>>>> idea is
>>>>>>>>>>>> plugable planner).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Acknowledgement*
>>>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth 
>>>>>>>>>>>> Knowles,
>>>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and 
>>>>>>>>>>>> also
>>>>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>>>>> [3]:
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Rui
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ----
>>>>>>> Mingmin
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ----
>>>>> Mingmin
>>>>>
>>>>

Reply via email to