[ https://issues.apache.org/jira/browse/DRILL-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Girish updated DRILL-6953: ----------------------------------- Fix Version/s: (was: 1.18.0) 1.19.0 > Merge row set-based JSON reader > ------------------------------- > > Key: DRILL-6953 > URL: https://issues.apache.org/jira/browse/DRILL-6953 > Project: Apache Drill > Issue Type: Sub-task > Affects Versions: 1.15.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Major > Labels: doc-impacting > Fix For: 1.19.0 > > > The final step in the ongoing "result set loader" saga is to merge the > revised JSON reader into master. This reader does two key things: > * Demonstrates the prototypical "late schema" style of data reading (discover > schema while reading). > * Implements many tricks and hacks to handle schema changes while loading. > * Shows that, even with all these tricks, the only true solution is to > actually have a schema. > The new JSON reader: > * Uses an expanded state machine when parsing rather than the complex set of > if-statements in the current version. > * Handles reading a run of nulls before seeing the first data value (as long > as the data value shows up in the first record batch). > * Uses the result-set loader to generate fixed-size batches regardless of the > complexity, depth of structure, or width of variable-length fields. > While the JSON reader itself is helpful, the key contribution is that it > shows how to use the entire kit of parts: result set loader, projection > framework, and so on. Since the projection framework can handle an external > schema, it is also a handy foundation for the ongoing schema project. > Key work to complete after this merger will be to reconcile actual data with > the external schema. For example, if we know a column is supposed to be a > VarChar, then read the column as a VarChar regardless of the type JSON itself > picks. Or, if a column is supposed to be a Double, then convert Int and > String JSON values into Doubles. > The Row Set framework was designed to allow inserting custom column writers. > This would be a great opportunity to do the work needed to create them. Then, > use the new JSON framework to allow parsing a JSON field as a specified Drill > type. -- This message was sent by Atlassian Jira (v8.3.4#803005)