Paul Rogers created DRILL-4710: ---------------------------------- Summary: Document Drill's JSON processing rules Key: DRILL-4710 URL: https://issues.apache.org/jira/browse/DRILL-4710 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Paul Rogers Priority: Minor
One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find. We should document how Drill handles various JSON scenarios. * SELECT * (schema inferred) * SELECT a, b, c (schema implied by query) And various JSON structures: * Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?) * Changes of the top-level map structure across rows. ** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10} ** Fields disappear later in the file ** Fields change type ** Start of file has many nulls for a field, later in file has non-null values. * How Drill handles array fields ** Array field is null: { a: [10, 20]}, { a: null } ** Array contains nulls: { a: [10, null, 20] } ** Array contains single scalar type (number or string) ** Array contains multiple scalar types (number and string) ** Aray contains structured types (array, map) * How Drill handles nested maps ** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }} ** Implicit select: * ** How data is delivered to Drill client ** How data is delivered to JDBC/ODBC clients * Size issues ** Very large records (what is max size?) ** Very large strings ** Vary large arrays Along with any other detailed information not covered by the above list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)