paul-rogers opened a new pull request #2068:
URL: https://github.com/apache/drill/pull/2068


   # [DRILL-7717](https://issues.apache.org/jira/browse/DRILL-7717): Support 
Mongo extended types in V2 JSON loader
   
   ## Description
   
   As it turns out, Drill's existing "V1" JSON reader provides support for [V1 
of Mongo's extended JSON 
types](https://docs.mongodb.com/manual/reference/mongodb-extended-json-v1/). 
This PR adds that support, along with the updated 
[V2](https://docs.mongodb.com/manual/reference/mongodb-extended-json/) support, 
to the V2 JSON reader. Drill provides a number of Drill extensions to the Mongo 
extended types; those are also include in the V2 reader.
   
   The V2 reader design is based on making the per-record read path as 
efficient as possible. Generally, we try to be two "virtual functions" away 
from converting a JSON token to a value vector entry. The first function is to 
a "value listener" that knows how to convert a JSON token to a desired data 
type, and a second function to write that value into a vector using a column 
writer.
   
   In this design, most of the complexity (of which there is an abundant 
amount) is pushed into "factory" classes and method which build up a 
parser/listener pair appropriate for each bit of JSON structure. The prior 
structure was a bit rigid and assumed a direct mapping from JSON structure to 
Drill's data types (a JSON object structure always represented a Drill `MAP` 
and so on.) This PR revises that structure to be more flexible in order to 
handle the Mongo extended types which treat a JSON object as a holder for a 
type declaration and a value. You'll see lots of code changed, but much was 
just moving things around.
   
   A new package is added to support the Mongo types (and Drill's extensions to 
the Mongo extended types.) Since this support required supporting a wider array 
of data types in JSON, the code also enables the use of a provided schema to do 
the same conversion with "non-extended" JSON.
   
   The implementation should allow adding other forms of extended JSON if there 
is a need.
   
   ## Documentation
   
   Drill's [JSON documentation](http://drill.apache.org/docs/json-data-model/) 
does not discuss the extended types. A full description, suitable for 
converting to user documentation, can be found in a 
[package-info.json](https://github.com/paul-rogers/drill/blob/f32057234935c632e740173d197b5104a08fa46d/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/extended/package-info.java)
 file.
   
   Once this code replaces the JSON reader (see below), users can provide a 
schema to specify the column type:
   
   * `INT`, BIGINT` -- Conversion from JSON integers and strings.
   * `FLOAT4`, `FLOAT8` -- Conversion from JSON integers, floats and strings. 
`NaN`, `-Infinity` and `Infinity` are supported as literals if enables. They 
are always supported as strings.
   * `VARDECIMAL` -- Conversion from JSON strings, integers and floats. 
Conversion is done from the value string to avoid rounding errors and range 
limits.
   * `VARCHAR` -- Conversion is allowed from any scalar type. The literal JSON 
text value is used.
   * `VARBINARY` -- Conversion from a JSON string encoded using Bse64 as per 
the Mongo spec.
   * `DATE` -- Allows a date, expressed as a string, in the ISO `YYYY-MM-DD` 
format.
   * `TIME` -- Allows a time, expressed as a string, in the ISO `HH:MM:SS.SSS` 
format.
   * `TIMESTAMP` -- Allows a date/time, expressed as an ISO string in the 
`YYYY-MM-DDTHH:MM:SS.SSS` format. The time is converted from the timezone 
expressed in the string into the local timezone (as required for a `TIMETAMP`.)
   * `INTERVAL` -- String encoded in the ISO period format: `PxYxMxDTxHxMxS`.
   * `INTERVALYEAR` -- String encoded in the ISO period format: `PxYxM-`.
   * `INTERVALDAY` -- String encoded in the ISO period format: `PxYxxDTxHxMxS`.
   
   ## Testing
   
   Added a category for JSON-related tests. Added unit tests for the new 
functionality.
   
   The prior version had separate tests for the JSON "parser" and "loader". 
After these changes, maintaining that separation became harder, and less 
useful. The prior "parser" tests are removed with functionality moving into the 
"loader" tests.
   
   No code other than the HTTP storage plugin uses this code at present, so 
this change will not affect other unit or functional tests.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to