[GitHub] [drill] paul-rogers commented on a change in pull request #1953: Add docs for Drill Metastore

GitBox Tue, 14 Jan 2020 21:07:15 -0800

paul-rogers commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r366683160


 ##########
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##########
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+       SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://<drill-hostname-or-ip-address>:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+       ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+       REFRESH METADATA ['level' LEVEL]
+       [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+       [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
 
 Review comment:
   (Please explain what `planner.enable_statistics` means.)
   
   You cannot use this command if statistics is disabled. Statistics usage is 
enabled by setting the SYSTEM/SESSION option `planner.enable_statistics`. It is 
(what?) by default. To enable statistics:
   
   ```
   ALTER SYSTEM SET `planner.enable_statistics` = true
   ```
   
   (Note: This behavior seems odd. Why can't I first gather stats and review 
them for quality before I have my users start using them? As it is, there is no 
ability to gather them, enable the option for a session for testing, verify 
that things work right, then turn it on for everyone.
   
   Also, what happens if I turn the option on, gather stats, then turn the 
option off? If we refuse to gather stats, do we automatically delete them if 
the option is disabled?)
   if statistics usage is disabled, the command 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1953: Add docs for Drill Metastore

Reply via email to