[ https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050759#comment-16050759 ]
Paul Rogers commented on DRILL-4824: ------------------------------------ Wonderful! One quick comment on section 2: Numeric Type Promotion. One goal of the new vector writers created to solve DRILL-5211 is the ability to do type promotion. There are three kinds: * Non-conflicting type promotion. (call {{setInt()}} on a FLOAT8 or DECIMAL vector, for example.) * "Transparent" type promotion (call {{setDouble()}} on an INT, which requires replacing one vector with another, but do so in the first batch where the change is transparent to the downstream operators.) * "Hard" type promotion: as above, but after the first batch. Causes a hard schema change ({{OK_NEW_SCHEMA}}. The code reviews for this work move quite slowly. Once the code is in master, we can add the above type promotion to the basic mechanism. Also, we should coordinate on this because another goal of DRILL-5211 is to rip out the existing vector writers from various readers (including JSON) and replace them with the new size-aware versions. So, your work should build on the new set of vector writers, not the current set. More comments to come. > Null maps / lists and non-provided state support for JSON fields. Numeric > types promotion. > ------------------------------------------------------------------------------------------ > > Key: DRILL-4824 > URL: https://issues.apache.org/jira/browse/DRILL-4824 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON > Affects Versions: 1.0.0 > Reporter: Roman > Assignee: Volodymyr Vysotskyi > > There is incorrect output in case of JSON file with complex nested data. > _JSON:_ > {code:none|title=example.json|borderStyle=solid} > { > "Field1" : { > } > } > { > "Field1" : { > "InnerField1": {"key1":"value1"}, > "InnerField2": {"key2":"value2"} > } > } > { > "Field1" : { > "InnerField3" : ["value3", "value4"], > "InnerField4" : ["value5", "value6"] > } > } > {code} > _Query:_ > {code:sql} > select Field1 from dfs.`/tmp/example.json` > {code} > _Incorrect result:_ > {code:none} > +---------------------------+ > | Field1 | > +---------------------------+ > {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{"key1":"value1"},"InnerField2" > {"key2":"value2"},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} > Theres is no need to output missing fields. In case of deeply nested > structure we will get unreadable result for user. > _Correct result:_ > {code:none} > +--------------------------+ > | Field1 | > +--------------------------+ > |{} > {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}} > {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)