[ https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293557#comment-16293557 ]
Paul Rogers edited comment on DRILL-6035 at 12/16/17 4:07 AM: -------------------------------------------------------------- h4. All-Text Mode Drill provides the ability to read scalar values as text: {code} ALTER SESSION SET `store.json.all_text_mode` = true {code} In this mode, JSON scalars are read as follows: || JSON Type || As Member Value | As Array Value || | Missing | NULL (VARCHAR) | N/A | | null | NULL (VARCHAR) | String value "null" | | true/false | "true"/"false" | Same | | Number | Number text | Same | | String | The string value (without quotes) | Same | All-text mode can overcome some schema change exceptions such as: * Long string of missing or null values before the first non-null value. * Different scalar types in different records. * Hetrogeneous arrays. * Arrays that contain nulls. (The null values are stored as empty strings.) In Drill 1.13, in all-text mode, missing columns are presumed to be Nullable VARCHAR. (Prior versions may have assumed Nullable INT.) As a result, if file1.json has column `x`, but file2.json does not, then no schema change will occur when combining the results since both files will assume that `x` is a Nullable VARCHAR. (Note that this works only if the query explicitly projects column `x`. It won't necessarily work for queries with the wildcard.) Note that all-text mode cannot overcome schema changes due to mixes of scalar and structured (object or list) types. was (Author: paul.rogers): h4. All-Text Mode Drill provides the ability to read scalar values as text: {code} ALTER SESSION SET `store.json.all_text_mode` = true {code} In this mode, JSON scalars are read as follows: || JSON Type || As Text || | Missing | NULL (VARCHAR) | | null | NULL (VARCHAR) | | true/false | "true"/"false" | | Number | Number text | | String | The string value (without quotes) | All-text mode can overcome some schema change exceptions such as: * Long string of missing or null values before the first non-null value. * Different scalar types in different records. * Hetrogeneous arrays. * Arrays that contain nulls. (The null values are stored as empty strings.) In Drill 1.13, in all-text mode, missing columns are presumed to be Nullable VARCHAR. (Prior versions may have assumed Nullable INT.) As a result, if file1.json has column `x`, but file2.json does not, then no schema change will occur when combining the results since both files will assume that `x` is a Nullable VARCHAR. (Note that this works only if the query explicitly projects column `x`. It won't necessarily work for queries with the wildcard.) Note that all-text mode cannot overcome schema changes due to mixes of scalar and structured (object or list) types. > Specify Drill's JSON behavior > ----------------------------- > > Key: DRILL-6035 > URL: https://issues.apache.org/jira/browse/DRILL-6035 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.13.0 > Reporter: Paul Rogers > Assignee: Pritesh Maker > > Drill supports JSON as its native data format. However, experience suggests > that Drill may have limitations in the JSON that Drill supports. This ticket > asks to clarify Drill's expected behavior on various kinds of JSON. > Topics to be addressed: > * Relational vs. non-relational structures > * JSON structures used in practice and how they map to Drill > * Support for varying data types > * Support for missing values, especially across files > These topics are complex, hence the request to provide a detailed > specifications that clarifies what Drill does and does not support (or what > is should and should not support.) -- This message was sent by Atlassian JIRA (v6.4.14#64029)