Paul Rogers created DRILL-5949:
----------------------------------
Summary: JSON format options should be part of plugin config; not
session options
Key: DRILL-5949
URL: https://issues.apache.org/jira/browse/DRILL-5949
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Paul Rogers
Drill provides a JSON record reader. Drill provides two ways to configure this
reader:
* Using the JSON plugin configuration.
* Using a set of session options.
The plugin configuration defines the file suffix associated with JSON files.
The session options are:
* {{store.json.all_text_mode}}
* {{store.json.read_numbers_as_double}}
* {{store.json.reader.skip_invalid_records}}
* {{store.json.reader.print_skipped_invalid_record_number}}
Suppose I have to JSON files from different sources (and keep them in distinct
directories.) For the one, I want to use {{all_text_mode}} off as the data is
nicely formatted. Also, my numbers are fine, so I want
{{read_numbers_as_double}} off.
But, the other file is a mess and uses a rather ad-hoc format. So, I want these
two options turned on.
As it turns out I often query both files. Today, I must set the session options
one way to query my "clean" file, then reverse them to query the "dirty" file.
Next, I want to join the two files. How do I set the options one way for the
"clean" file, and the other for the "dirty" file within the *same query*? Can't.
Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It
has a variety of options. But, the are *not* session options; they are instead
options in the plugin definition. This allows me to, say, have a plugin config
for CSV-with-headers files that I get from source A, and a different plugin
config for my CSV-without-headers files from source B.
Suppose we applied the text reader technique to the JSON reader. We'd move the
session options listed above into the JSON format plugin. Then, I can define
one plugin for my "clean" files, and a different plugin config for my "dirty"
files.
What's more, I can then use table functions to adjust the format for each file
as needed within a single query. Since table functions are part of a query, I
can add them to a view that I define for the various JSON files.
The result is a far simpler user experience than the tedium of resetting
session options for every query.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)