[ https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293316#comment-16293316 ]
Paul Rogers edited comment on DRILL-6035 at 12/16/17 4:08 AM: -------------------------------------------------------------- h4. JSON Arrays Drill supports simple arrays in JSON using the following rules: * Arrays must contain hetrogeneous elements: any of the scalars described above, or a JSON object. (See a later comment for nested arrays.) For example, the following are scalar arrays: {code} [10, 20] [10.30, 10.45] ["foo", "bar"] [true, false] {code} h4. Schema Change in Arrays The following will trigger errors: {code} {a: [10, "foo"]} // Mixed types {a: [10]} {a: ["foo"]} // Schema change {a: [10, 12.5]} // Conflicting types: integer and float {code} h4. Nulls with Arrays Rules for nulls are: * Arrays may not contain nulls. (Drill does not support nulls as array elements.) * A null (or missing) array field is treated the same as an empty array. The following is invalid: {code} [10, null, 20] {code} The following are all valid: {code} {id: 1} {id: 2, a: null} {id: 3, a: []} {id: 4, a: [10, 20, 30]} {code} As described, Drill will defer picking an array type if it sees null values. In the above example, for id=2, Drill sees column `a` but does not pick a type. For id=3, Drill identifies that `a` is an array, but does not know the type. Finally, for id=4, Drill identifies the array as {{Repeated BIGINT}}. (This is the behavior for Drill 1.13, earlier versions may differ and require investigation.) As usual, if the first file or batch contains only nulls, Drill will guess {{Nullable VARCHAR}} which will cause a schema change error if later records reveal the type to be an array (of any type.) If the first batch contains only nulls and/or empty arrays, Drill guesses that the type is {{Repeated VARCHAR}}. (Again, this is specific to Drill 1.13.) For example: {code} {id: 1} {id: 2, a: null} {id: 3, a: []} {code} was (Author: paul.rogers): h4. JSON Arrays Drill supports simple arrays in JSON using the following rules: * Arrays must contain hetrogeneous elements: any of the scalars described above, or a JSON object. (See a later comment for nested arrays.) For example, the following are scalar arrays: {code} [10, 20] [10.30, 10.45] ["foo", "bar"] [true, false] {code} h4. Schema Change in Arrays The following will trigger errors: {code} {a: [10, "foo"]} // Mixed types {a: [10]} {a: ["foo"]} // Schema change {code} h4. Nulls with Arrays Rules for nulls are: * Arrays may not contain nulls. (Drill does not support nulls as array elements.) * A null (or missing) array field is treated the same as an empty array. The following is invalid: {code} [10, null, 20] {code} The following are all valid: {code} {id: 1} {id: 2, a: null} {id: 3, a: []} {id: 4, a: [10, 20, 30]} {code} As described, Drill will defer picking an array type if it sees null values. In the above example, for id=2, Drill sees column `a` but does not pick a type. For id=3, Drill identifies that `a` is an array, but does not know the type. Finally, for id=4, Drill identifies the array as {{Repeated BIGINT}}. (This is the behavior for Drill 1.13, earlier versions may differ and require investigation.) As usual, if the first file or batch contains only nulls, Drill will guess {{Nullable VARCHAR}} which will cause a schema change error if later records reveal the type to be an array (of any type.) If the first batch contains only nulls and/or empty arrays, Drill guesses that the type is {{Repeated VARCHAR}}. (Again, this is specific to Drill 1.13.) For example: {code} {id: 1} {id: 2, a: null} {id: 3, a: []} {code} > Specify Drill's JSON behavior > ----------------------------- > > Key: DRILL-6035 > URL: https://issues.apache.org/jira/browse/DRILL-6035 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.13.0 > Reporter: Paul Rogers > Assignee: Pritesh Maker > > Drill supports JSON as its native data format. However, experience suggests > that Drill may have limitations in the JSON that Drill supports. This ticket > asks to clarify Drill's expected behavior on various kinds of JSON. > Topics to be addressed: > * Relational vs. non-relational structures > * JSON structures used in practice and how they map to Drill > * Support for varying data types > * Support for missing values, especially across files > These topics are complex, hence the request to provide a detailed > specifications that clarifies what Drill does and does not support (or what > is should and should not support.) -- This message was sent by Atlassian JIRA (v6.4.14#64029)