paul-rogers opened a new pull request #2068: URL: https://github.com/apache/drill/pull/2068
# [DRILL-7717](https://issues.apache.org/jira/browse/DRILL-7717): Support Mongo extended types in V2 JSON loader ## Description As it turns out, Drill's existing "V1" JSON reader provides support for [V1 of Mongo's extended JSON types](https://docs.mongodb.com/manual/reference/mongodb-extended-json-v1/). This PR adds that support, along with the updated [V2](https://docs.mongodb.com/manual/reference/mongodb-extended-json/) support, to the V2 JSON reader. Drill provides a number of Drill extensions to the Mongo extended types; those are also include in the V2 reader. The V2 reader design is based on making the per-record read path as efficient as possible. Generally, we try to be two "virtual functions" away from converting a JSON token to a value vector entry. The first function is to a "value listener" that knows how to convert a JSON token to a desired data type, and a second function to write that value into a vector using a column writer. In this design, most of the complexity (of which there is an abundant amount) is pushed into "factory" classes and method which build up a parser/listener pair appropriate for each bit of JSON structure. The prior structure was a bit rigid and assumed a direct mapping from JSON structure to Drill's data types (a JSON object structure always represented a Drill `MAP` and so on.) This PR revises that structure to be more flexible in order to handle the Mongo extended types which treat a JSON object as a holder for a type declaration and a value. You'll see lots of code changed, but much was just moving things around. A new package is added to support the Mongo types (and Drill's extensions to the Mongo extended types.) Since this support required supporting a wider array of data types in JSON, the code also enables the use of a provided schema to do the same conversion with "non-extended" JSON. The implementation should allow adding other forms of extended JSON if there is a need. ## Documentation Drill's [JSON documentation](http://drill.apache.org/docs/json-data-model/) does not discuss the extended types. A full description, suitable for converting to user documentation, can be found in a [package-info.json](https://github.com/paul-rogers/drill/blob/f32057234935c632e740173d197b5104a08fa46d/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/extended/package-info.java) file. Once this code replaces the JSON reader (see below), users can provide a schema to specify the column type: * `INT`, BIGINT` -- Conversion from JSON integers and strings. * `FLOAT4`, `FLOAT8` -- Conversion from JSON integers, floats and strings. `NaN`, `-Infinity` and `Infinity` are supported as literals if enables. They are always supported as strings. * `VARDECIMAL` -- Conversion from JSON strings, integers and floats. Conversion is done from the value string to avoid rounding errors and range limits. * `VARCHAR` -- Conversion is allowed from any scalar type. The literal JSON text value is used. * `VARBINARY` -- Conversion from a JSON string encoded using Bse64 as per the Mongo spec. * `DATE` -- Allows a date, expressed as a string, in the ISO `YYYY-MM-DD` format. * `TIME` -- Allows a time, expressed as a string, in the ISO `HH:MM:SS.SSS` format. * `TIMESTAMP` -- Allows a date/time, expressed as an ISO string in the `YYYY-MM-DDTHH:MM:SS.SSS` format. The time is converted from the timezone expressed in the string into the local timezone (as required for a `TIMETAMP`.) * `INTERVAL` -- String encoded in the ISO period format: `PxYxMxDTxHxMxS`. * `INTERVALYEAR` -- String encoded in the ISO period format: `PxYxM-`. * `INTERVALDAY` -- String encoded in the ISO period format: `PxYxxDTxHxMxS`. ## Testing Added a category for JSON-related tests. Added unit tests for the new functionality. The prior version had separate tests for the JSON "parser" and "loader". After these changes, maintaining that separation became harder, and less useful. The prior "parser" tests are removed with functionality moving into the "loader" tests. No code other than the HTTP storage plugin uses this code at present, so this change will not affect other unit or functional tests. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org