By the way. We should also figure out how this fits with the project to create a lenient parser that can handle any dialect of SQL. I am calling that parser “Babel”[1]. That parser will be able to handle BigQuery dialect, among others.
Here’s my current thinking. I think that Babel should be a new module (a sibling to calcite-server, calcite-druid etc.) and its parser will extend the core parser. That means that calcite-babel will not inherit from the DDL parser in the calcite-server module, nor vice versa. We will probably end up with two parsers that are capable of handling DDL, and two sets of AST classes. But I think that is OK, or at least, better than the chaos of trying to reuse too much. At least, the parsers will share 99% of their DNA with the core parser. And we can easily share tests. Julian [1] https://issues.apache.org/jira/browse/CALCITE-2280 <https://issues.apache.org/jira/browse/CALCITE-2280> > On May 1, 2018, at 11:16 PM, Shuyi Chen <[email protected]> wrote: > > Hi Anton, thanks a lot for the great questions. > > Yes, SqlDataTypeSpec currently only support creating simple SQL types, no > row/array/map is supported. > > CALCITE-2045 adds support for defining custom either simple or row types > through the type DDL, and you should be able to use the UDT in your Table > DDL for complex row type. I think this should be close to what you want. > > You can extend current type DDL in its current form in BEAM parser and add > support for map and array type, or modify the grammar to tailor your need > to make it BigQuery compatible. All the required change for supporting UDT > in calcite-core should be already done by CALCITE-2045. > > As for the big query syntax, I am not sure if it's a good idea to adopt it > in core parser unless there is no SQL equivalent, but if you implement it > in your extended BEAM parser, it's up to you and that's by design of > Calcite DDL. > > Let me know if it helps. > > Thanks > Shuyi > > On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <[email protected]> > wrote: > >> Hi, >> >> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache >> Beam SQL DDL (based on Calcite DDL extensions). What would be the best way >> to approach this? >> >> *Our Use Case:* >> We want to be able to use DDL to define data sources and sinks for Beam >> pipelines, so that users don't have to wrap SQL into custom code which >> configures sources/sinks. >> >> *What we have already:* >> We have a customized CREATE TABLE statement which allows users to specify >> the type of the data source, its schema, and data location. The >> implmentation is based on Calcite DDL extensions. >> >> *What we're missing:* >> We need to be able to define schemas with non-primitive types, e.g. >> arrays or rows, so that we can correctly describe data sources and sinks >> which supports such types. For example if we want to manipulate data in a >> stream of JSON objects, we want to be able to describe the JSON contents >> somehow, including arrays or nested objects. Or we would need similar types >> to interact with BigQuery which supports arrays and nested struct types. >> >> *Problem:* >> I tried to check if it is possible to extend the parser using the >> config.fmpp approach, so that we can hook into the Parser.TypeName() >> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0 >> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439> >> method and parse the complex types ourselves. But Parser.DataType() >> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0 >> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377> >> creates >> SqlDataTypeSpec only in two specific ways, without ability to extend it, so >> even if we parse the typename ourselves, we would not be able to construct >> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we >> could, looking at SqlDataTypeSpec >> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e >> bdd467d811/core/src/main/java/org/apache/calcite/sql/ >> SqlDataTypeSpec.java#L327> >> it seems that it does not support creating arrays or rows as well: it calls >> typeFactory.createSqlType(typename) >> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e >> bdd467d811/core/src/main/java/org/apache/calcite/sql/ >> SqlDataTypeSpec.java#L350> >> which >> only >> <https://github.com/apache/calcite/blob/f47465236b7650f2280092b708fa39 >> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/ >> SqlTypeFactoryImpl.java#L49> >> creates basic types in this call. >> >> *Path forward:* >> It the above is correct, then it appears that we would need to patch >> Calcite in couple of places to support arrays, rows, and maps in DDL: >> - update Parser.jj to support parsing the type definitions for the >> required types and constructing SqlDataTypeSpec correctly for those cases; >> - update SqlDataTypeSpec.java to handle complex types and invoke >> correct typeFactory interfaces; >> >> *Questions:* >> - does the above sound sane/correct? >> - is there a similar work already tracked in Calcite somewhere? I saw >> something mentioned in CALCITE-2045 >> <https://issues.apache.org/jira/browse/CALCITE-2045? >> focusedCommentId=16351203&page=com.atlassian.jira. >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>, >> but didn't see any tracking Jiras specifically for this work yet; >> - is there a known/recommended/working syntax for such DDL? If there is >> none, then would it make sense to adopt something similar to BigQuery >> STRUCT/ARRAY >> definition <https://cloud.google.com/bigquery/docs/data-definition- >> language> >> ? >> >> Thank you, >> Anton >> > > > > -- > "So you have to trust that the dots will somehow connect in your future."
