Thanks Manu for you feedback! Some comments inlined:
On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <[email protected]> wrote: > A question to the community, does the size of the change require any >> process besides the usual PR reviews? >> > > I think so. This is a big change and has come as kind of a surprise (sorry > if I've missed previous discussions). > > Rui, could you explain more on how things will play out between BeamSQL > and ZetaSQL (A design doc including the pluggable interface would be > perfect). > I see. I will have a document about some basic idea on Beam ZetaSQL (this is my way to call "ZetaSQL as a SQL dialect in BeamSQL", and I usually use Beam CalciteSQL to refer to Calcite's SQL dialect.). At least from users perspective, it's simple to use: setup planner name in BeamSqlPipelineOptions <https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29> and BeamSQL will initialize different planners: either Calcite or ZetaSQL is supported now. > From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port or a > connector to ZetaSQL? Do we need to depend on > https://github.com/google/zetasql ? ZetaSQL looks interesting but I could > barely find any doc for end users. > ZetaSQL provides a Java interface which calls c++ binary through JNI. For using ZetaSQL in BeamSQL, we only need to depend on ZetaSQL jars in maven central (https://mvnrepository.com/search?q=zetasql). These jars contains all we need to call ZetaSQL analyzer by Java. > Also, I'd prefer the PR to be split into two, one for the pluggable > interface and one for the ZetaSQL. > > Pluggable planner is already a separate PR merged before: https://github.com/apache/beam/pull/7745 -Rui > Thanks, > Manu > > > > On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <[email protected]> wrote: > >> Thank you Rui for the heads up. >> >> A question to the community, does the size of the change require any >> process besides the usual PR reviews? >> >> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <[email protected]> wrote: >> >>> Hi community, >>> >>> I have been working on supporting ZetaSQL[1] as a SQL dialect in >>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is >>> ZetaSQL's documentation[2]. >>> >>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a >>> plugable query planner interface in BeamSQL, and we can easily plug in a >>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new >>> planners by this way (e.g. PostgreSQL dialect). >>> >>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam >>> repo(#9210 <https://github.com/apache/beam/pull/9210>). This >>> contribution barely touch existing Beam code (because the idea is plugable >>> planner). >>> >>> >>> *Acknowledgement* >>> Thanks to all the people who provided help during Beam ZetaSQL >>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles, >>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also >>> thanks to contributions which are not listed. >>> >>> >>> [1]: https://github.com/google/zetasql >>> [2]: https://github.com/google/zetasql/tree/master/docs >>> [3]: >>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java >>> >>> >>> -Rui >>> >>
