[
https://issues.apache.org/jira/browse/DRILL-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258550#comment-16258550
]
Aman Sinha commented on DRILL-5974:
-----------------------------------
For the specific example of the 2-D array, Drill currently can query it using
the array [i] [j] syntax, as shown below (I took this example from the Drill
docs page [1]):
{noformat}
SELECT features[0].geometry.coordinates[0][1] from
dfs.`/Users/asinha/data/json/polygon`;
+-----------------------------------------------+
| EXPR$0 |
+-----------------------------------------------+
| [-122.42207601332528,37.808835019815085,0.0] |
+-----------------------------------------------+
{noformat}
If we treat it as text mode, my understanding is the above query will no
longer work.
Independent of this example, I do agree that there are other types of JSON
structures that Drill cannot process without setting the all_text_mode, so
those could benefit. For instance, a list with heterogenous types, list with
NULL values.
[1] https://drill.apache.org/docs/json-data-model/
> Read JSON non-relational fields using text mode
> -----------------------------------------------
>
> Key: DRILL-5974
> URL: https://issues.apache.org/jira/browse/DRILL-5974
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.13.0
>
>
> Proposed is a minor enhancement to the JSON reader to better handle
> non-relational JSON structures.
> As background, Drill handles simple tuples:
> {code}
> {a: 10, b: “fred”}
> {code}
> Drill also handles arrays:
> {code}
> {name: “fred”, hobbies: [“bowling”, “golf”]}
> {code}
> Drill even handles arrays of tuples:
> {code}
> {name: “fred”, orders: [
> {id: 1001, amount: 12.34},
> {id: 1002, amount: 56.78}]}
> {code}
> The above are termed "relational" because there is a straightforward mapping
> to/from tables into the above JSON structures.
> Things get interesting with non-relational types, such as 2-D arrays:
> {code}
> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
> {code}
> Drill has two solutions:
> * Turn on the experimental list and union support.
> * Enable all-text mode to read all fields as JSON text.
> Proposed is a middle ground:
> * Read fields with relational types into vectors.
> * Read non-relational fields using text mode.
> Thus, the first three examples would all result in the JSON data parsed into
> Drill vectors. But, the fourth, non-relational example would produce a row
> that looks like this:
> {noformat}
> id, shape, points
> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
> {noformat}
> Although Drill can’t parse the 2-D array, Drill will pass the array along to
> the client, which can use its favorite JSON parser to parse the array and do
> something useful (like draw the square in this case.)
> Specifically, the proposal is to:
> * Apply this change only to the revised “batch size aware” JSON reader.
> * Use the above parsing model by default.
> * Use the experimental list-and-union support if the existing
> {{exec.enable_union_type}} system/session option is set.
> Existing queries should “just work.” In fact, now JSON with non-relational
> types will work “out-of-the-box” without all-text mode or the experimental
> types.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)