vvysotskyi commented on a change in pull request #1986: Additional changes for
Drill Metastore docs
URL: https://github.com/apache/drill/pull/1986#discussion_r386535350
##########
File path:
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
##########
@@ -10,6 +10,31 @@ The Metastore is a Beta feature; it is subject to change.
We encourage you to tr
Because the Metastore is in Beta, the SQL commands and Metastore formats may
change in the next release.
{% include startnote.html %}In Drill 1.17, this feature is supported for
Parquet tables only and is disabled by default.{% include endnote.html %}
+## Drill Metastore introduction
+
+One of the main advantages of Drill is schema-on-read. But Drill can’t handle
some cases with this approach, there are the issues related to Schema Evolution
and Schema Changes.
+
+Significant benefits of schema-aware execution:
+
+ - At Planning time:
+ - Better scope for planning optimizations.
+ - Proper estimation of column widths since types are known, hence more
accurate costing.
+ - Graceful early exit if certain data type validations fail.
+ - At Runtime:
+ - Avoids some cases with `SchemaChange` exceptions. All minor fragments
will have a common understanding of the schema.
+
+Reading the data along with its statistics metadata helps to build more
efficient plans and optimize query execution:
+
+ - Crucial for optimal join planning, 2-phase aggregation vs 1-phase
aggregation planning, selectivity estimation of filter conditions,
parallelization decisions.
+
+Taking into account the above points, existing query processing can be
improved by:
+
+ - storing table schema and reusing it;
+ - collecting, storing and reusing table statistics to improve query planning.
+
+One of the main steps to resolve all these goals is providing the framework
for Metadata management named hereafter
Review comment:
Thanks, done.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services