[ https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025040#comment-16025040 ]
Paul Rogers commented on DRILL-4824: ------------------------------------ The trick, of course, to adding the new null states is that the existing "bit" vector is used by all operators in code generation, and by Drill clients such as ODBC and JDBC drivers. Further, Apache Arrow is a fork of Drill, so improving our null support will drive the two projects further apart. Planning for all this stuff is required before we start writing code. For example, if we know that a client is a version before this fix, we can translate the new null vector into the "legacy" bit vector. But, Drill does not have a versioned client API, so we have no way to know the version of the client. So, we have to tackle that problem as well. In short, this is an important, but non-trivial, project. > Add not-provided and null states for map and list fields in JSON > ---------------------------------------------------------------- > > Key: DRILL-4824 > URL: https://issues.apache.org/jira/browse/DRILL-4824 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON > Affects Versions: 1.0.0 > Reporter: Roman > Assignee: Volodymyr Vysotskyi > > There is incorrect output in case of JSON file with complex nested data. > _JSON:_ > {code:none|title=example.json|borderStyle=solid} > { > "Field1" : { > } > } > { > "Field1" : { > "InnerField1": {"key1":"value1"}, > "InnerField2": {"key2":"value2"} > } > } > { > "Field1" : { > "InnerField3" : ["value3", "value4"], > "InnerField4" : ["value5", "value6"] > } > } > {code} > _Query:_ > {code:sql} > select Field1 from dfs.`/tmp/example.json` > {code} > _Incorrect result:_ > {code:none} > +---------------------------+ > | Field1 | > +---------------------------+ > {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{"key1":"value1"},"InnerField2" > {"key2":"value2"},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} > Theres is no need to output missing fields. In case of deeply nested > structure we will get unreadable result for user. > _Correct result:_ > {code:none} > +--------------------------+ > | Field1 | > +--------------------------+ > |{} > {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}} > {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)