[ https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453560#comment-16453560 ]
Paul Rogers commented on DRILL-4824: ------------------------------------ This ticket has a long history of complexity. Just discovered another one. It appears that the current null handling has been optimized to make results appear nicely in {{sqlline}}. Consider this simple file: {noformat} {a: {b: 10}} {a: {c: "foo"}} {noformat} According to our existing rules, the missing columns are stored as null values. Then, JSON omits nulls from its output. Why? So, it seems, {{sqlline}} can display the following: {noformat} +--------------+ | a | +--------------+ | {"b":10} | | {"c":"foo"} | +--------------+ {noformat} In JDBC, the {{getObject()}} method on the Map vector creates a JSON object. That code probably omits null values. Why? So that with {{sqlline}} calls {{toString()}} on the JSON object, it gets the nice display above. Probably this is not how {{sqlline}} should format its output: our JSON internals should not be dictated by how we do {{toString()}} in {{sqlline}}. But, there you have it anyway. > Null maps / lists and non-provided state support for JSON fields. Numeric > types promotion. > ------------------------------------------------------------------------------------------ > > Key: DRILL-4824 > URL: https://issues.apache.org/jira/browse/DRILL-4824 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON > Affects Versions: 1.0.0 > Reporter: Roman Kulyk > Assignee: Volodymyr Vysotskyi > Priority: Major > > There is incorrect output in case of JSON file with complex nested data. > _JSON:_ > {code:none|title=example.json|borderStyle=solid} > { > "Field1" : { > } > } > { > "Field1" : { > "InnerField1": {"key1":"value1"}, > "InnerField2": {"key2":"value2"} > } > } > { > "Field1" : { > "InnerField3" : ["value3", "value4"], > "InnerField4" : ["value5", "value6"] > } > } > {code} > _Query:_ > {code:sql} > select Field1 from dfs.`/tmp/example.json` > {code} > _Incorrect result:_ > {code:none} > +---------------------------+ > | Field1 | > +---------------------------+ > {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{"key1":"value1"},"InnerField2" > {"key2":"value2"},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} > Theres is no need to output missing fields. In case of deeply nested > structure we will get unreadable result for user. > _Correct result:_ > {code:none} > +--------------------------+ > | Field1 | > +--------------------------+ > |{} > {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}} > {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)