paul-rogers commented on a change in pull request #1986: Additional changes for
Drill Metastore docs
URL: https://github.com/apache/drill/pull/1986#discussion_r384869742
##########
File path:
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
##########
@@ -10,6 +10,31 @@ The Metastore is a Beta feature; it is subject to change.
We encourage you to tr
Because the Metastore is in Beta, the SQL commands and Metastore formats may
change in the next release.
{% include startnote.html %}In Drill 1.17, this feature is supported for
Parquet tables only and is disabled by default.{% include endnote.html %}
+## Drill Metastore introduction
+
+One of the main advantages of Drill is schema-on-read. But Drill can’t handle
some cases with this approach, there are the issues related to Schema Evolution
and Schema Changes.
Review comment:
No capitalization of schema evolution or schema changes.
Should we explain each of these? Actually, schema change is an internal
effect result from an external effect: ambiguous schema.
Seem some recent mail list discussions for more background. Basically, Drill
infers schema by sampling the first row. This sampling works well if all files
have a clear, identical, unambiguous schema. However, if files contain
different columns due to schema evolution, or columns are null (as for JSON),
Drill can't infer the schema from the data and the user must provide a hint.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services