Paul Rogers created DRILL-7597:
----------------------------------

             Summary: Read selected JSON colums as JSON text
                 Key: DRILL-7597
                 URL: https://issues.apache.org/jira/browse/DRILL-7597
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.17.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.18.0


See . The use case wishes to read selected JSON columns as JSON text rather 
than parsing the JSON into a relational structure as is done today in the JSON 
reader.

The JSON reader supports "all text mode", but, despite the name, this mode only 
works for scalars (primitives) such as numbers. It does not work for structured 
types such as objects or arrays: such types are always parsed into Drill 
structures (which causes the conflict describe in __.)

Instead, we need a feature to read an entire JSON value, including structure, 
as a JSON string.

This feature would work best when the user can parse some parts of a JSON input 
file into relational structure, others as JSON. (This is the use case which the 
user list user faced.) So, we need a way to do that.

Drill has a "provided schema" feature, which, at present, is used only for text 
files (and recently with limited support in Avro.) We are working on a project 
to add such support for JSON.

Perhaps we can leverage this feature to allow the JSON reader to read chunks of 
JSON as text which can be manipulated by those future JSON functions. In the 
example, column "c" would be read as JSON text; Drill would not attempt to 
parse it into a relational structure.

As it turns out, the "new" JSON reader we're working on originally had a 
feature to do just that, but we took it out because we were not sure it was 
needed. Sounds like we should restore it as part of our "provided schema" 
support. It could work this way: if you CREATE SCHEMA with column "c" as 
VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the 
entire nested structure as JSON without trying to parse it into a relational 
structure.

This ticket asks to build the concept:

* Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field to 
be read as JSON.
* Implement the "read column as JSON" feature in the new EVF-based JSON reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to