[ 
https://issues.apache.org/jira/browse/DRILL-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258550#comment-16258550
 ] 

Aman Sinha commented on DRILL-5974:
-----------------------------------

For the specific example of the 2-D array, Drill currently can query it using 
the array [i] [j] syntax, as shown below (I took this example from the Drill 
docs page [1]):
{noformat}
SELECT features[0].geometry.coordinates[0][1] from 
dfs.`/Users/asinha/data/json/polygon`;
+-----------------------------------------------+
|                    EXPR$0                     |
+-----------------------------------------------+
| [-122.42207601332528,37.808835019815085,0.0]  |
+-----------------------------------------------+
{noformat}

If we treat it as text mode,  my understanding is the above query will no 
longer work.   
Independent of this example, I do agree that there are other types of JSON 
structures that Drill cannot process without setting the all_text_mode, so 
those could benefit.  For instance, a list with heterogenous types,  list with 
NULL values.  

[1] https://drill.apache.org/docs/json-data-model/

> Read JSON non-relational fields using text mode
> -----------------------------------------------
>
>                 Key: DRILL-5974
>                 URL: https://issues.apache.org/jira/browse/DRILL-5974
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.13.0
>
>
> Proposed is a minor enhancement to the JSON reader to better handle 
> non-relational JSON structures.
> As background, Drill handles simple tuples:
> {code}
> {a: 10, b: “fred”}
> {code}
> Drill also handles arrays:
> {code}
> {name: “fred”, hobbies: [“bowling”, “golf”]}
> {code}
> Drill even handles arrays of tuples:
> {code}
> {name: “fred”, orders: [
>   {id: 1001, amount: 12.34},
>   {id: 1002, amount: 56.78}]}
> {code}
> The above are termed "relational" because there is a straightforward mapping 
> to/from tables into the above JSON structures.
> Things get interesting with non-relational types, such as 2-D arrays:
> {code}
> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
> {code}
> Drill has two solutions:
> * Turn on the experimental list and union support.
> * Enable all-text mode to read all fields as JSON text.
> Proposed is a middle ground:
> * Read fields with relational types into vectors.
> * Read non-relational fields using text mode.
> Thus, the first three examples would all result in the JSON data parsed into 
> Drill vectors. But, the fourth, non-relational example would produce a row 
> that looks like this:
> {noformat}
> id, shape, points
> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
> {noformat}
> Although Drill can’t parse the 2-D array, Drill will pass the array along to 
> the client, which can use its favorite JSON parser to parse the array and do 
> something useful (like draw the square in this case.)
> Specifically, the proposal is to:
> * Apply this change only to the revised “batch size aware” JSON reader.
> * Use the above parsing model by default.
> * Use the experimental list-and-union support if the existing 
> {{exec.enable_union_type}} system/session option is set.
> Existing queries should “just work.” In fact, now JSON with non-relational 
> types will work “out-of-the-box” without all-text mode or the experimental 
> types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to