Paul Rogers created DRILL-4710:
----------------------------------
Summary: Document Drill's JSON processing rules
Key: DRILL-4710
URL: https://issues.apache.org/jira/browse/DRILL-4710
Project: Apache Drill
Issue Type: Improvement
Components: Documentation
Reporter: Paul Rogers
Priority: Minor
One of Drill's key benefits is the ability to query JSON-formatted data. Much
great work has been done. But, unless someone happens to be a Drill developer,
the details of exactly how Drill handles various JSON formats can be hard to
find.
We should document how Drill handles various JSON scenarios.
* SELECT * (schema inferred)
* SELECT a, b, c (schema implied by query)
And various JSON structures:
* Top-level structure (list of maps. Can we handle an array of maps? A list of
scalars?)
* Changes of the top-level map structure across rows.
** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b:
"s", c: 10}
** Fields disappear later in the file
** Fields change type
** Start of file has many nulls for a field, later in file has non-null values.
* How Drill handles array fields
** Array field is null: { a: [10, 20]}, { a: null }
** Array contains nulls: { a: [10, null, 20] }
** Array contains single scalar type (number or string)
** Array contains multiple scalar types (number and string)
** Aray contains structured types (array, map)
* How Drill handles nested maps
** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
** Implicit select: *
** How data is delivered to Drill client
** How data is delivered to JDBC/ODBC clients
* Size issues
** Very large records (what is max size?)
** Very large strings
** Vary large arrays
Along with any other detailed information not covered by the above list.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)