Hari Sekhon created DRILL-3524:
----------------------------------
Summary: Drill proper DESCRIBE support for MongoDB
Key: DRILL-3524
URL: https://issues.apache.org/jira/browse/DRILL-3524
Project: Apache Drill
Issue Type: Bug
Components: Metadata
Affects Versions: 1.1.0
Reporter: Hari Sekhon
Assignee: Steven Phillips
Request to add full DESCRIBE support for MongoDB collections.
I understand this may be difficult / sub-optimal due to the flexible schema
nature of Mongo docs but if you can tabulate results when reading directly from
MongoDB for which you have read the field names, then it's also possible to
extract all field names to present for the describe command, albeit an
inefficient scan to do so.
Currently describe returns a pseudo / inaccurate / unhelpful metadata:
{code}+--------------+------------+--------------+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+--------------+------------+--------------+
| * | ANY | YES |
+--------------+------------+--------------+{code}
Perhaps you could extend DESCRIBE to scan the first few dozen docs by default
to create a merged schema as well as adding an optional argument to the
describe command to allow for scanning a user-specified number of docs from
which to describe the schema, or an ALL argument keyword to describe to scan
all docs in a collection to get the complete global schema for the collection?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)