[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380192497
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380218259
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380209257
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380210657
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379576801
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380229001
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380275860
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,69 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For Drill 1.17,
+ this is default Drill Metastore implementation. For details on how to 
configure Iceberg Metastore implementation and
+ its option descriptions, please refer to [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
+
+{% include startnote.html %}
+Iceberg table supports concurrent writes and transactions but they are only 
effective on file systems that support
+ atomic rename.
+If the file system does not support atomic rename, it could lead to 
inconsistencies during concurrent writes.
+{% include endnote.html %}
+
+### Iceberg Tables Location
+
+Iceberg tables will reside on the file system in the location based on
+Iceberg Metastore base location `drill.metastore.iceberg.location.base_path` 
and component specific location.
+If Iceberg Metastore base location is `/drill/metastore/iceberg`
+and tables component location is `tables`. Iceberg table for tables component
+will be located in `/drill/metastore/iceberg/tables` folder.
+
+Metastore metadata will be stored inside Iceberg table location provided
+in the configuration file. Drill table metadata location will be constructed
+based on specific component storage keys. For example, for `tables` component,
+storage keys are storage plugin, workspace and table name: unique table 
identifier in Drill.
 
 Review comment:
   Thanks, replaced.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379568993
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
 
 Review comment:
   Thanks, replaced. Currently, user can delete only metadata for an existing 
table. Added this info also.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380209127
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380230120
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380276160
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,69 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For Drill 1.17,
+ this is default Drill Metastore implementation. For details on how to 
configure Iceberg Metastore implementation and
+ its option descriptions, please refer to [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
+
+{% include startnote.html %}
+Iceberg table supports concurrent writes and transactions but they are only 
effective on file systems that support
+ atomic rename.
+If the file system does not support atomic rename, it could lead to 
inconsistencies during concurrent writes.
+{% include endnote.html %}
+
+### Iceberg Tables Location
+
+Iceberg tables will reside on the file system in the location based on
+Iceberg Metastore base location `drill.metastore.iceberg.location.base_path` 
and component specific location.
+If Iceberg Metastore base location is `/drill/metastore/iceberg`
+and tables component location is `tables`. Iceberg table for tables component
+will be located in `/drill/metastore/iceberg/tables` folder.
+
+Metastore metadata will be stored inside Iceberg table location provided
+in the configuration file. Drill table metadata location will be constructed
+based on specific component storage keys. For example, for `tables` component,
+storage keys are storage plugin, workspace and table name: unique table 
identifier in Drill.
+
+Assume Iceberg table location is `/drill/metastore/iceberg/tables`, metadata 
for the table
+`dfs.tmp.nation` will be stored in the 
`/drill/metastore/iceberg/tables/dfs/tmp/nation` folder.
 
 Review comment:
   Thanks, updated the docs as proposed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380146051
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380164420
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380209529
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380250105
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,69 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For Drill 1.17,
+ this is default Drill Metastore implementation. For details on how to 
configure Iceberg Metastore implementation and
+ its option descriptions, please refer to [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
+
+{% include startnote.html %}
+Iceberg table supports concurrent writes and transactions but they are only 
effective on file systems that support
+ atomic rename.
+If the file system does not support atomic rename, it could lead to 
inconsistencies during concurrent writes.
+{% include endnote.html %}
+
+### Iceberg Tables Location
+
+Iceberg tables will reside on the file system in the location based on
+Iceberg Metastore base location `drill.metastore.iceberg.location.base_path` 
and component specific location.
 
 Review comment:
   Good point! Added sentence before this one about configuration files and 
added specified that the above is the configuration property.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380159080
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379573331
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380278702
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,69 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For Drill 1.17,
+ this is default Drill Metastore implementation. For details on how to 
configure Iceberg Metastore implementation and
+ its option descriptions, please refer to [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
+
+{% include startnote.html %}
+Iceberg table supports concurrent writes and transactions but they are only 
effective on file systems that support
+ atomic rename.
+If the file system does not support atomic rename, it could lead to 
inconsistencies during concurrent writes.
+{% include endnote.html %}
+
+### Iceberg Tables Location
+
+Iceberg tables will reside on the file system in the location based on
+Iceberg Metastore base location `drill.metastore.iceberg.location.base_path` 
and component specific location.
+If Iceberg Metastore base location is `/drill/metastore/iceberg`
+and tables component location is `tables`. Iceberg table for tables component
+will be located in `/drill/metastore/iceberg/tables` folder.
+
+Metastore metadata will be stored inside Iceberg table location provided
+in the configuration file. Drill table metadata location will be constructed
+based on specific component storage keys. For example, for `tables` component,
+storage keys are storage plugin, workspace and table name: unique table 
identifier in Drill.
+
+Assume Iceberg table location is `/drill/metastore/iceberg/tables`, metadata 
for the table
+`dfs.tmp.nation` will be stored in the 
`/drill/metastore/iceberg/tables/dfs/tmp/nation` folder.
+
+Example of base Metastore configuration file `drill-metastore-override.conf`, 
where Iceberg tables will be stored in
+ hdfs:
+
+```
+drill.metastore.iceberg: {
+  config.properties: {
+fs.defaultFS: "hdfs:///"
+  }
+
+  location: {
+base_path: "/drill/metastore",
+relative_path: "iceberg"
+  }
+}
+```
+
+### Metadata Storage Format
+
+Iceberg tables support data storage in three formats: Parquet, Avro, ORC. 
Drill metadata will be stored in Parquet files.
+This format was chosen over others since it is column oriented and efficient 
in terms of disk I/O when specific
+columns need to be queried.
+
+Each Parquet file will hold information for one partition. Partition keys will 
depend on Metastore
+component characteristics. For example, for tables component, partitions keys 
are storage plugin, workspace,
+table name and metadata key.
+
+Parquet files name will be based on UUID to ensure uniqueness. If somehow 
collision occurs, modify operation
+in Metastore will fail.
 
 Review comment:
   Thanks, removed this section.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380199004
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380241423
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380164542
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380270959
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,69 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For Drill 1.17,
+ this is default Drill Metastore implementation. For details on how to 
configure Iceberg Metastore implementation and
+ its option descriptions, please refer to [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
+
+{% include startnote.html %}
+Iceberg table supports concurrent writes and transactions but they are only 
effective on file systems that support
+ atomic rename.
+If the file system does not support atomic rename, it could lead to 
inconsistencies during concurrent writes.
+{% include endnote.html %}
+
+### Iceberg Tables Location
+
+Iceberg tables will reside on the file system in the location based on
+Iceberg Metastore base location `drill.metastore.iceberg.location.base_path` 
and component specific location.
+If Iceberg Metastore base location is `/drill/metastore/iceberg`
+and tables component location is `tables`. Iceberg table for tables component
 
 Review comment:
   Thanks, updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380199229
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380219490
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379567240
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
 
 Review comment:
   Metastore supports single files also.
   Added part of the info you have proposed and added references to the 
examples, where was described how to query partitions and segments metadata.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379576091
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380244476
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,69 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For Drill 1.17,
+ this is default Drill Metastore implementation. For details on how to 
configure Iceberg Metastore implementation and
+ its option descriptions, please refer to [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
+
+{% include startnote.html %}
+Iceberg table supports concurrent writes and transactions but they are only 
effective on file systems that support
+ atomic rename.
+If the file system does not support atomic rename, it could lead to 
inconsistencies during concurrent writes.
+{% include endnote.html %}
+
+### Iceberg Tables Location
+
 
 Review comment:
   Thanks, added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379556928
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
 
 Review comment:
   Yes, we have a section below with the real tables and examples of how to 
discover metastore metadata.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r374778009
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r380159969
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379541595
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
 
 Review comment:
   Thanks, separated these two concepts and added links to iceberg 
documentation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r37466
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
 
 Review comment:
   Thanks, reworded.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379551973
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
 
 Review comment:
   Thanks, replaced as you proposed, but also left mentioning that we have 
metadata about segments, files, row groups, partitions since it wasn't 
described in this doc yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379573563
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379569561
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
the `BaseTableMetadata` class.
+
+A table can be non-partitioned and partitioned. Non-partitioned tables have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups, and partitions.
+
+A unique table identifier in Metastore Tables is a combination of storage 
plugin, workspace, and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+
+### Related Session/System Options
+
 
 Review comment:
   Thanks, replaced.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379534144
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
 
 Review comment:
   Thanks, reworded.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379543295
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
 
 Review comment:
   Thanks, updated section with the info you have proposed and added a link to 
main Jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379521993
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
 
 Review comment:
   Thanks, good idea. I have added a section where enumerated problems that 
Metastore may help to solve.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-02-17 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r379543754
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,408 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is a Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ in the same way as it is done during regular select and computes some 
metadata like `MIN` / `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more 
optimizations like filter push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate 
and store table statistics into Drill
+ Metastore.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+The default value is the following:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+}
+```
+
+Note, that currently out of box Iceberg Metastore is available and is the 
default one. Though any custom
+ implementation can be added by placing the JAR into classpath which has the 
implementation of
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class 
in the `drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in 
the future.
+
+### Metastore Tables
 
 Review comment:
   Thanks, agree that it may seem a little bit confusing, so changed as you 
have proposed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373565916
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373539540
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373566746
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,63 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date:
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For
 
 Review comment:
   Thanks, added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373567421
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/030-drill-iceberg-metastore.md
 ##
 @@ -0,0 +1,63 @@
+---
+title: "Drill Iceberg Metastore"
+parent: "Drill Metastore"
+date:
+---
+
+Drill uses Iceberg Metastore implementation based on [Iceberg 
tables](http://iceberg.incubator.apache.org). For
+ details on how to configure Iceberg Metastore implementation and its option 
descriptions, please refer to
+ [Iceberg Metastore 
docs](https://github.com/apache/drill/blob/master/metastore/iceberg-metastore/README.md).
 
 Review comment:
   Thanks, added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373538164
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373537401
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373537700
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373564880
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373564314
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373530287
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
 
 Review comment:
   Thanks, provided the example with a default value and noted about that and 
the possibility of using custom implementation as you have suggested.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373536691
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373464139
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
 
 Review comment:
   Thanks, removed and updated the previous abstract with additional info.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373534723
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+If you run the `ANALYZE TABLE` command at the same time as queries run, then 
the query can read incorrect or corrupt statistics.
+Drill will reload statistics and replan the query. This option spe

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373533767
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
+
+Table can be non-partitioned and partitioned. Non-partitioned tables, have 
only one top-level segment 
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned 
tables may have several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row 
groups and partitions.
+
+Unique table identifier in Metastore Tables is combination of storage plugin, 
workspace and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of 
the top-level segment and its metadata
+is storage plugin, workspace, table name and metadata key.
+
+### Related Session/System Options
+
+The following options are set via `ALTER SYSTEM SET`, or `ALTER SESSION SET` 
or via the Drill Web console.
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Same options as the 
_level_ option above. Default is `'ALL'`.
 
 Review comment:
   Thanks, fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373531743
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
+
+### Metastore Tables
+
+Metastore Tables component contains metadata about Drill tables, including 
general information, as well as
+information about table segments, files, row groups, partitions.
+
+Full table metadata consists of two major concepts: general information and 
top-level segments metadata.
+Table general information contains basic table information and corresponds to 
`BaseTableMetadata` class.
 
 Review comment:
   Thanks, fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-31 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373531286
 
 

 ##
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##
 @@ -0,0 +1,378 @@
+---
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-30
+---
+
+Drill 1.17 introduces the Drill Metastore which stores the table schema and 
table statistics. Statistics allow Drill to better create optimal query plans.
+
+The Metastore is an Beta feature; it is subject to change. We encourage you to 
try it and provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+## Enabling Drill Metastore
+
+To use the Drill Metastore, you must enable it at the session or system level 
with one of the following commands:
+
+   SET `metastore.enabled` = true;
+   ALTER SYSTEM SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Computing and storing table metadata to Drill Metastore
+
+Once you enable the Metastore, the next step is to populate it with data. 
Drill can query a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know 
that Hive requires that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to 
the Metastore when doing so improves
+ query performance. In general, large tables benefit from statistics more than 
small tables do.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table 
+ and computes some metadata like MIN / MAX column values and NULLS COUNT 
designated as "metadata" to be able to
+ produce more optimizations like filter push-down, etc. If 
`planner.statistics.use` option is enabled, this command
+ will also calculate and store table statistics into Drill Metastore.
+
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill 
infers the schema by scanning your table
+ in the same way as it is done during regular select.
+
+## Configuration
+
+Default Metastore configuration is defined in `drill-metastore-default.conf` 
file.
+It can be overridden in `drill-metastore-override.conf`. Distribution 
configuration can be
+indicated in `drill-metastore-distrib.conf`.
+
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property 
`drill.metastore.implementation.class`.
+
+### Metastore Components
+
+Metastore can store metadata for various components: tables, views etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like udfs, storage plugins, etc. be added in 
future.
 
 Review comment:
   Thanks, fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371891854
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371899388
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373014394
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
 
 Review comment:
   Updated comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371931973
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371904606
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373004866
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
 
 Review comment:
   `ALTER SESSION` may be omitted, by default it will be set at a session level.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371864788
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371921044
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371907634
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371906774
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371842074
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
 
 Review comment:
   Removed part "For internal usage". These options may be useful for the case 
when the existing table column matches its default value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371873405
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373029314
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371754837
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
 
 Review comment:
   I think the existing comment more informative.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371755721
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
 
 Review comment:
   Thanks, added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371146761
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
 
 Review comment:
   Thanks, updated with the text you have proposed. Also, added info on how 
this feature will affect existing Parquet table metadata cache files into the 
`Related Commands` section.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371922729
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371850639
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r372428487
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371682050
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
 
 Review comment:
   @paul-rogers, no, it doesn't infer a column name for directory names. But 
for the case when table has `transDate` field, which has the same values within 
a partition, it will be also pruned during planning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371167524
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
 
 Review comment:
   Before the `Syntax` section, was added a sentence that this command gathers 
schema and MIN / MAX and NULLS COUNT info named in this doc as "metadata". 
Also, added a sentence there regarding behavior with statistics - it will be 
collected only if `planner.statistics.use` option is enabled.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371408775
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
 
 Review comment:
   Thanks, reworded in the way you have pointed.
   Regarding `EXCLUDE` clause, I like this idea, will create Jira to implement 
it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371758903
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
 
 Review comment:
   Thanks, updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371763891
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
 
 Review comment:
   Metadata keyword was added here to distinguish commands connected with 
metastore usage, i.e. enabling, retry_attempts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371842348
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
 
 Review comment

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371873857
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371922057
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371843593
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371858041
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371407625
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
 
 Review comment:
   Thanks, fixed "Stored".
   
   Yes, it will infer the schema for all columns and will collect statistics 
only for specified ones. replaced metadata with statistics.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371843143
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371413095
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
 
 Review comment:
   The solution to use it as VARCHAR literal was done to be consistent with 
other usages. For example, the default level may be set to the 
`metastore.metadata.store.depth_level` session option:
   ```
   set `metastore.metadata.store.depth_level`='segment';
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371752852
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
 
 Review comment:
   Thanks, renamed section topic. Regarding the scope of options, for some 
cases, it is useful to disable metastore at session level or specify set other 
options.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371682359
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
 
 Review comment:
   Thanks, replaced.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371754567
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
 
 Review comment:
   This part came from statistics work and just adapted to be used with 
Metastore. The description is the same as for the corresponding statistics page.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371157237
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
 
 Review comment:
   Thanks, this transmission is a good idea, updated docs with the text you 
have proposed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371835861
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
 
 Review comment:
   Thanks, updated text. Yes, that's correct that it will be supported only for 
regular tables. This and /or meant schema without statistics. With enumerated 
possible values, it should be clear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371796205
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
 
 Review comment:
   Yes, that's correct where we use it. Added point when it may be disabled.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373017272
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/009-analyze-table-compute-statistics.md
 ##
 @@ -99,7 +99,7 @@ Controls the 'compression' factor for the TDigest algorithm 
used for histogram s
 
 ## Reserved Keywords
 
-The ANALYZE TABLE statement introduces the following reserved keywords:  
+The ANALYZE TABLE COMPUTE STATISTICS statement introduces the following 
reserved keywords:  
 
 Review comment:
   Command wasn't changed, just added additional words which already are part 
of the command to distinguish it from another command which starts similarly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371789711
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
 
 Review comment:
   Added a reference to file metadata cache page.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371167638
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
 
 Review comment:
   Good point, thanks, added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371909842
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371873784
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r373016273
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

[GitHub] [drill] vvysotskyi commented on a change in pull request #1953: Add docs for Drill Metastore

2020-01-30 Thread GitBox
vvysotskyi commented on a change in pull request #1953: Add docs for Drill 
Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r371902565
 
 

 ##
 File path: 
_docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+
+   SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at 
`http://:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+   ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+   REFRESH METADATA ['level' LEVEL]
+   [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+   [ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting 
metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically 
collected for every table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For 
internal usage when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For 
internal usage when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+   ANALYZE

  1   2   >