[
https://issues.apache.org/jira/browse/DRILL-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092935#comment-17092935
]
ASF GitHub Bot commented on DRILL-7717:
---------------------------------------
paul-rogers opened a new pull request #2068:
URL: https://github.com/apache/drill/pull/2068
# [DRILL-7717](https://issues.apache.org/jira/browse/DRILL-7717): Support
Mongo extended types in V2 JSON loader
## Description
As it turns out, Drill's existing "V1" JSON reader provides support for [V1
of Mongo's extended JSON
types](https://docs.mongodb.com/manual/reference/mongodb-extended-json-v1/).
This PR adds that support, along with the updated
[V2](https://docs.mongodb.com/manual/reference/mongodb-extended-json/) support,
to the V2 JSON reader. Drill provides a number of Drill extensions to the Mongo
extended types; those are also include in the V2 reader.
The V2 reader design is based on making the per-record read path as
efficient as possible. Generally, we try to be two "virtual functions" away
from converting a JSON token to a value vector entry. The first function is to
a "value listener" that knows how to convert a JSON token to a desired data
type, and a second function to write that value into a vector using a column
writer.
In this design, most of the complexity (of which there is an abundant
amount) is pushed into "factory" classes and method which build up a
parser/listener pair appropriate for each bit of JSON structure. The prior
structure was a bit rigid and assumed a direct mapping from JSON structure to
Drill's data types (a JSON object structure always represented a Drill `MAP`
and so on.) This PR revises that structure to be more flexible in order to
handle the Mongo extended types which treat a JSON object as a holder for a
type declaration and a value. You'll see lots of code changed, but much was
just moving things around.
A new package is added to support the Mongo types (and Drill's extensions to
the Mongo extended types.) Since this support required supporting a wider array
of data types in JSON, the code also enables the use of a provided schema to do
the same conversion with "non-extended" JSON.
The implementation should allow adding other forms of extended JSON if there
is a need.
## Documentation
Drill's [JSON documentation](http://drill.apache.org/docs/json-data-model/)
does not discuss the extended types. A full description, suitable for
converting to user documentation, can be found in a
[package-info.json](https://github.com/paul-rogers/drill/blob/f32057234935c632e740173d197b5104a08fa46d/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/extended/package-info.java)
file.
Once this code replaces the JSON reader (see below), users can provide a
schema to specify the column type:
* `INT`, BIGINT` -- Conversion from JSON integers and strings.
* `FLOAT4`, `FLOAT8` -- Conversion from JSON integers, floats and strings.
`NaN`, `-Infinity` and `Infinity` are supported as literals if enables. They
are always supported as strings.
* `VARDECIMAL` -- Conversion from JSON strings, integers and floats.
Conversion is done from the value string to avoid rounding errors and range
limits.
* `VARCHAR` -- Conversion is allowed from any scalar type. The literal JSON
text value is used.
* `VARBINARY` -- Conversion from a JSON string encoded using Bse64 as per
the Mongo spec.
* `DATE` -- Allows a date, expressed as a string, in the ISO `YYYY-MM-DD`
format.
* `TIME` -- Allows a time, expressed as a string, in the ISO `HH:MM:SS.SSS`
format.
* `TIMESTAMP` -- Allows a date/time, expressed as an ISO string in the
`YYYY-MM-DDTHH:MM:SS.SSS` format. The time is converted from the timezone
expressed in the string into the local timezone (as required for a `TIMETAMP`.)
* `INTERVAL` -- String encoded in the ISO period format: `PxYxMxDTxHxMxS`.
* `INTERVALYEAR` -- String encoded in the ISO period format: `PxYxM-`.
* `INTERVALDAY` -- String encoded in the ISO period format: `PxYxxDTxHxMxS`.
## Testing
Added a category for JSON-related tests. Added unit tests for the new
functionality.
The prior version had separate tests for the JSON "parser" and "loader".
After these changes, maintaining that separation became harder, and less
useful. The prior "parser" tests are removed with functionality moving into the
"loader" tests.
No code other than the HTTP storage plugin uses this code at present, so
this change will not affect other unit or functional tests.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Support Mongo extended types in V2 JSON loader
> ----------------------------------------------
>
> Key: DRILL-7717
> URL: https://issues.apache.org/jira/browse/DRILL-7717
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.18.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.18.0
>
>
> Drill supports Mongo's extended types in the V1 JSON reader. Add similar
> support to the V2 version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)