Paul Rogers created DRILL-7597:
----------------------------------
Summary: Read selected JSON colums as JSON text
Key: DRILL-7597
URL: https://issues.apache.org/jira/browse/DRILL-7597
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Fix For: 1.18.0
See . The use case wishes to read selected JSON columns as JSON text rather
than parsing the JSON into a relational structure as is done today in the JSON
reader.
The JSON reader supports "all text mode", but, despite the name, this mode only
works for scalars (primitives) such as numbers. It does not work for structured
types such as objects or arrays: such types are always parsed into Drill
structures (which causes the conflict describe in __.)
Instead, we need a feature to read an entire JSON value, including structure,
as a JSON string.
This feature would work best when the user can parse some parts of a JSON input
file into relational structure, others as JSON. (This is the use case which the
user list user faced.) So, we need a way to do that.
Drill has a "provided schema" feature, which, at present, is used only for text
files (and recently with limited support in Avro.) We are working on a project
to add such support for JSON.
Perhaps we can leverage this feature to allow the JSON reader to read chunks of
JSON as text which can be manipulated by those future JSON functions. In the
example, column "c" would be read as JSON text; Drill would not attempt to
parse it into a relational structure.
As it turns out, the "new" JSON reader we're working on originally had a
feature to do just that, but we took it out because we were not sure it was
needed. Sounds like we should restore it as part of our "provided schema"
support. It could work this way: if you CREATE SCHEMA with column "c" as
VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the
entire nested structure as JSON without trying to parse it into a relational
structure.
This ticket asks to build the concept:
* Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field to
be read as JSON.
* Implement the "read column as JSON" feature in the new EVF-based JSON reader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)