[ https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293265#comment-16293265 ]
Paul Rogers edited comment on DRILL-6035 at 12/15/17 9:59 PM: -------------------------------------------------------------- h4. JSON Scalars Drill uses simple rules to infer scalar data types from JSON input: * If the value is quoted, it is a Nullable {{VARCHAR}}. Example: {{a: "varchar"}} * If the value is numeric, and has no decimal point, it is a Nullable {{BIGINT}}. Example: {{a: 10}} * If the value is numeric, and has a decimal point, it is a Nullable {{FLOAT8}}. Eample: {{a: 10.5}} * If the value is {{true}} or {{false}}, it is a Nullable {{BIT}}, stored as Nullable {{UINT1}}, with {{true}} stored as 1, {{false}} as 0. Example: {{a: true}} A special case occurs if the user enables {{store.json.all_text_mode}}. In this case, all cases above give rise to a nullable {{VARCHAR}} column type. h4. Nulls with Scalar Values The rules for {{null}} are a bit more complex. The rules below apply to the Drill 1.13 JSON reader, earlier versions require investigation. * If the {{null}} occurs after seeing one of the above types, then the value is a {{NULL}} of that type (such as a {{NULL}} {{VARCHAR}}.) * If the {{null}} occurs at the start of the file, Drill defers identifying the data type until it sees the first non-null value. * If the entire file (or first batch of a file: 4K records before Drill 1.13, variable number in Drill 1.13 or later) contains only {{null}} values, the column is defined as nullable {{VARCHAR}}. (Drill 1.12 and earlier likely has different behavior.) For example, in the following example, Drill 1.13 identifies the type as nullable {{BIGINT}}: {code} {a: null} {a: null} {a: null} {a: 10} {code} In the following example, Drill must guess that the data type is nullable {{VARCHAR}}: {code} {a: null} {a: null} {a: null} {code} was (Author: paul.rogers): Drill uses simple rules to infer scalar data types from JSON input: * If the value is quoted, it is a Nullable {{VARCHAR}}. Example: {{a: "varchar"}} * If the value is numeric, and has no decimal point, it is a Nullable {{BIGINT}}. Example: {{a: 10}} * If the value is numeric, and has a decimal point, it is a Nullable {{FLOAT8}}. Eample: {{a: 10.5}} * If the value is {{true}} or {{false}}, it is a Nullable {{BIT}}, stored as Nullable {{UINT1}}, with {{true}} stored as 1, {{false}} as 0. Example: {{a: true}} A special case occurs if the user enables {{store.json.all_text_mode}}. In this case, all cases above give rise to a nullable {{VARCHAR}} column type. h4. Nulls with Scalar Values The rules for {{null}} are a bit more complex. The rules below apply to the Drill 1.13 JSON reader, earlier versions require investigation. * If the {{null}} occurs after seeing one of the above types, then the value is a {{NULL}} of that type (such as a {{NULL}} {{VARCHAR}}.) * If the {{null}} occurs at the start of the file, Drill defers identifying the data type until it sees the first non-null value. * If the entire file (or first batch of a file: 4K records before Drill 1.13, variable number in Drill 1.13 or later) contains only {{null}} values, the column is defined as nullable {{VARCHAR}}. (Drill 1.12 and earlier likely has different behavior.) For example, in the following example, Drill 1.13 identifies the type as nullable {{BIGINT}}: {code} {a: null} {a: null} {a: null} {a: 10} {code} In the following example, Drill must guess that the data type is nullable {{VARCHAR}}: {code} {a: null} {a: null} {a: null} {code} > Specify Drill's JSON behavior > ----------------------------- > > Key: DRILL-6035 > URL: https://issues.apache.org/jira/browse/DRILL-6035 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.13.0 > Reporter: Paul Rogers > Assignee: Pritesh Maker > > Drill supports JSON as its native data format. However, experience suggests > that Drill may have limitations in the JSON that Drill supports. This ticket > asks to clarify Drill's expected behavior on various kinds of JSON. > Topics to be addressed: > * Relational vs. non-relational structures > * JSON structures used in practice and how they map to Drill > * Support for varying data types > * Support for missing values, especially across files > These topics are complex, hence the request to provide a detailed > specifications that clarifies what Drill does and does not support (or what > is should and should not support.) -- This message was sent by Atlassian JIRA (v6.4.14#64029)