[ 
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293316#comment-16293316
 ] 

Paul Rogers edited comment on DRILL-6035 at 12/16/17 4:08 AM:
--------------------------------------------------------------

h4. JSON Arrays

Drill supports simple arrays in JSON using the following rules:

* Arrays must contain hetrogeneous elements: any of the scalars described 
above, or a JSON object.

(See a later comment for nested arrays.)

For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}

h4. Schema Change in Arrays

The following will trigger errors:

{code}
{a: [10, "foo"]}  // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}

h4. Nulls with Arrays

Rules for nulls  are:

* Arrays may not contain nulls. (Drill does not support nulls as array 
elements.)
* A null (or missing) array field is treated the same as an empty array.

The following is invalid:
{code}
[10, null, 20]
{code}

The following are all valid:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}

As described, Drill will defer picking an array type if it sees null values. In 
the above example, for id=2, Drill sees column `a` but does not pick a type. 
For id=3, Drill identifies that `a` is an array, but does not know the type. 
Finally, for id=4, Drill identifies the array as {{Repeated BIGINT}}. (This is 
the behavior for Drill 1.13, earlier versions may differ and require 
investigation.)

As usual, if the first file or batch contains only nulls, Drill will guess 
{{Nullable VARCHAR}} which will cause a schema change error if later records 
reveal the type to be an array (of any type.)

If the first batch contains only nulls and/or empty arrays, Drill guesses that 
the type is {{Repeated VARCHAR}}. (Again, this is specific to Drill 1.13.) For 
example:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{code}


was (Author: paul.rogers):
h4. JSON Arrays

Drill supports simple arrays in JSON using the following rules:

* Arrays must contain hetrogeneous elements: any of the scalars described 
above, or a JSON object.

(See a later comment for nested arrays.)

For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}

h4. Schema Change in Arrays

The following will trigger errors:

{code}
{a: [10, "foo"]}  // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{code}

h4. Nulls with Arrays

Rules for nulls  are:

* Arrays may not contain nulls. (Drill does not support nulls as array 
elements.)
* A null (or missing) array field is treated the same as an empty array.

The following is invalid:
{code}
[10, null, 20]
{code}

The following are all valid:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}

As described, Drill will defer picking an array type if it sees null values. In 
the above example, for id=2, Drill sees column `a` but does not pick a type. 
For id=3, Drill identifies that `a` is an array, but does not know the type. 
Finally, for id=4, Drill identifies the array as {{Repeated BIGINT}}. (This is 
the behavior for Drill 1.13, earlier versions may differ and require 
investigation.)

As usual, if the first file or batch contains only nulls, Drill will guess 
{{Nullable VARCHAR}} which will cause a schema change error if later records 
reveal the type to be an array (of any type.)

If the first batch contains only nulls and/or empty arrays, Drill guesses that 
the type is {{Repeated VARCHAR}}. (Again, this is specific to Drill 1.13.) For 
example:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{code}

> Specify Drill's JSON behavior
> -----------------------------
>
>                 Key: DRILL-6035
>                 URL: https://issues.apache.org/jira/browse/DRILL-6035
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests 
> that Drill may have limitations in the JSON that Drill supports. This ticket 
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed 
> specifications that clarifies what Drill does and does not support (or what 
> is should and should not support.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to