[ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050759#comment-16050759
 ] 

Paul Rogers commented on DRILL-4824:
------------------------------------

Wonderful! One quick comment on section 2: Numeric Type Promotion. One goal of 
the new vector writers created to solve DRILL-5211 is the ability to do type 
promotion. There are three kinds:

* Non-conflicting type promotion. (call {{setInt()}} on a FLOAT8 or DECIMAL 
vector, for example.)
* "Transparent" type promotion (call {{setDouble()}} on an INT, which requires 
replacing one vector with another, but do so in the first batch where the 
change is transparent to the downstream operators.)
* "Hard" type promotion: as above, but after the first batch. Causes a hard 
schema change ({{OK_NEW_SCHEMA}}.

The code reviews for this work move quite slowly. Once the code is in master, 
we can add the above type promotion to the basic mechanism.

Also, we should coordinate on this because another goal of DRILL-5211 is to rip 
out the existing vector writers from various readers (including JSON) and 
replace them with the new size-aware versions. So, your work should build on 
the new set of vector writers, not the current set.

More comments to come.

> Null maps / lists and non-provided state support for JSON fields. Numeric 
> types promotion.
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4824
>                 URL: https://issues.apache.org/jira/browse/DRILL-4824
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Roman
>            Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
>         "Field1" : {
>         }
> }
> {
>         "Field1" : {
>                 "InnerField1": {"key1":"value1"},
>                 "InnerField2": {"key2":"value2"}
>         }
> }
> {
>         "Field1" : {
>                 "InnerField3" : ["value3", "value4"],
>                 "InnerField4" : ["value5", "value6"]
>         }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> |          Field1           |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> |         Field1           |
> +--------------------------+
> |{}                                                                     
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to