[ 
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293557#comment-16293557
 ] 

Paul Rogers edited comment on DRILL-6035 at 12/16/17 4:07 AM:
--------------------------------------------------------------

h4. All-Text Mode

Drill provides the ability to read scalar values as text:

{code}
ALTER SESSION SET `store.json.all_text_mode` = true
{code}

In this mode, JSON scalars are read as follows:

|| JSON Type || As Member Value | As Array Value ||
| Missing | NULL (VARCHAR) | N/A |
| null | NULL (VARCHAR) |  String value "null" |
| true/false | "true"/"false" | Same |
| Number | Number text | Same |
| String | The string value (without quotes) | Same |

All-text mode can overcome some schema change exceptions such as:

* Long string of missing or null values before the first non-null value.
* Different scalar types in different records.
* Hetrogeneous arrays.
* Arrays that contain nulls. (The null values are stored as empty strings.)

In Drill 1.13, in all-text mode, missing columns are presumed to be Nullable 
VARCHAR. (Prior versions may have assumed Nullable INT.) As a result, if 
file1.json has column `x`, but file2.json does not, then no schema change will 
occur when combining the results since both files will assume that `x` is a 
Nullable VARCHAR. (Note that this works only if the query explicitly projects 
column `x`. It won't necessarily work for queries with the wildcard.)

Note that all-text mode cannot overcome schema changes due to mixes of scalar 
and structured (object or list) types.


was (Author: paul.rogers):
h4. All-Text Mode

Drill provides the ability to read scalar values as text:

{code}
ALTER SESSION SET `store.json.all_text_mode` = true
{code}

In this mode, JSON scalars are read as follows:

|| JSON Type || As Text ||
| Missing | NULL (VARCHAR) |
| null | NULL (VARCHAR) | 
| true/false | "true"/"false" |
| Number | Number text |
| String | The string value (without quotes) |

All-text mode can overcome some schema change exceptions such as:

* Long string of missing or null values before the first non-null value.
* Different scalar types in different records.
* Hetrogeneous arrays.
* Arrays that contain nulls. (The null values are stored as empty strings.)

In Drill 1.13, in all-text mode, missing columns are presumed to be Nullable 
VARCHAR. (Prior versions may have assumed Nullable INT.) As a result, if 
file1.json has column `x`, but file2.json does not, then no schema change will 
occur when combining the results since both files will assume that `x` is a 
Nullable VARCHAR. (Note that this works only if the query explicitly projects 
column `x`. It won't necessarily work for queries with the wildcard.)

Note that all-text mode cannot overcome schema changes due to mixes of scalar 
and structured (object or list) types.

> Specify Drill's JSON behavior
> -----------------------------
>
>                 Key: DRILL-6035
>                 URL: https://issues.apache.org/jira/browse/DRILL-6035
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests 
> that Drill may have limitations in the JSON that Drill supports. This ticket 
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed 
> specifications that clarifies what Drill does and does not support (or what 
> is should and should not support.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to