arina-ielchiieva commented on a change in pull request #1953: Add docs for
Drill Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373457265
##########
File path:
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
##########
@@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level
with one of the following commands:
+
+ SET `metastore.enabled` = true;
+ ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at
`http://<drill-hostname-or-ip-address>:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data.
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill
infers the schema by scanning your table
+ and computes some metadata like MIN / MAX column values and NULLS COUNT
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf`
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property
`drill.metastore.implementation.class`.
Review comment:
You can add example, how it looks like and note, that user can change the
implementation.
You can note, that currently out of box Iceberg Metastore is available and
is the default one. Though any custom implementation can be added by placing
jar into classpath which has implementation of
`org.apache.drill.metastore.Metastore` interface and indicating custom class in
the `drill.metastore.implementation.class`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services