This is an automated email from the ASF dual-hosted git repository. bridgetb pushed a commit to branch gh-pages in repository https://gitbox.apache.org/repos/asf/drill.git
The following commit(s) were added to refs/heads/gh-pages by this push: new a7f649c Updates for new maprdb format plugin options, rn update a7f649c is described below commit a7f649cd2c193395b87ad3eb9c0d3bf85ce667e7 Author: Bridget Bevens <bbev...@maprtech.com> AuthorDate: Fri May 24 15:52:23 2019 -0700 Updates for new maprdb format plugin options, rn update --- .../plugins/095-mapr-db-format.md | 6 ++- .../011-running-drill-on-docker.md | 14 +++--- _docs/rn/005-1.16.0-rn.md | 4 +- .../sql-commands/011-refresh-table-metadata.md | 7 ++- .../sql-functions/020-data-type-conversion.md | 54 +++++++++++++++++++--- 5 files changed, 67 insertions(+), 18 deletions(-) diff --git a/_docs/connect-a-data-source/plugins/095-mapr-db-format.md b/_docs/connect-a-data-source/plugins/095-mapr-db-format.md index a1a0287..a2719ca 100644 --- a/_docs/connect-a-data-source/plugins/095-mapr-db-format.md +++ b/_docs/connect-a-data-source/plugins/095-mapr-db-format.md @@ -1,6 +1,6 @@ --- title: "MapR-DB Format" -date: 2018-06-26 00:42:18 UTC +date: 2019-05-24 parent: "Connect a Data Source" --- @@ -16,6 +16,8 @@ Instead of including the name of a file, you include the table name in the query SELECT * FROM mfs.`/users/max/mytable`; -**Note:** Starting in Drill 1.14, the MapR Drill installation package includes a hive-maprdb-json-handler, which enables you to create Hive external tables from MapR-DB JSON tables and then query the tables using the Hive schema. Drill can use the native Drill reader to read the Hive external tables. The native Drill reader enables Drill to perform faster reads of data and apply filter pushdown optimizations. The hive-maprdb-json-handler is not included in the Apache Drill installation package. +Starting in Drill 1.14, the MapR Drill installation package includes a hive-maprdb-json-handler, which enables you to create Hive external tables from MapR-DB JSON tables and then query the tables using the Hive schema. Drill can use the native Drill reader to read the Hive external tables. The native Drill reader enables Drill to perform faster reads of data and apply filter pushdown optimizations. The hive-maprdb-json-handler is not included in the Apache Drill installation package. + +Starting in Drill 1.16, you can include the `readTimestampWithZoneOffset` option in the maprdb format plugin configuration. When enabled (set to 'true'), Drill converts timestamp values from UTC to local time zone when reading the values from MapR Database. The option is disabled by default and does not impact the `store.hive.maprdb_json.read_timestamp_with_timezone_offset` setting. diff --git a/_docs/install/installing-drill-in-embedded-mode/011-running-drill-on-docker.md b/_docs/install/installing-drill-in-embedded-mode/011-running-drill-on-docker.md index 471999d..f3e49b2 100644 --- a/_docs/install/installing-drill-in-embedded-mode/011-running-drill-on-docker.md +++ b/_docs/install/installing-drill-in-embedded-mode/011-running-drill-on-docker.md @@ -1,12 +1,12 @@ --- title: "Running Drill on Docker" -date: 2019-05-02 +date: 2019-05-24 parent: "Installing Drill in Embedded Mode" --- -Starting in Drill 1.14, you can run Drill in a [Docker container](https://www.docker.com/what-container#/package_software). Running Drill in a container is the simplest way to start using Drill; all you need is the Docker client installed on your machine. You simply run a Docker command, and your Docker client downloads the Drill Docker image from the apache-drill repository on [Docker Hub](https://docs.docker.com/docker-hub/) and then brings up a container with Apache Drill running in [...] +Starting in Drill 1.14, you can run Drill in a [Docker container](https://www.docker.com/what-container#/package_software). Running Drill in a container is the simplest way to start using Drill; all you need is the Docker client installed on your machine. You simply run a Docker command, and your Docker client downloads the Drill Docker image from the apache-drill repository on [Docker Hub](https://docs.docker.com/docker-hub/) and brings up a container with Apache Drill running in embedd [...] -**Note:** Currently, you can only run Drill in embedded mode in a Docker container. Embedded mode is when a single instance of Drill runs on a node or in a container. You do not have to perform any configuration tasks when Drill runs in embedded mode. +Currently, you can only run Drill in embedded mode in a Docker container. Embedded mode is when a single instance of Drill runs on a node or in a container. You do not have to perform any configuration tasks when Drill runs in embedded mode. ## Prerequisite @@ -30,12 +30,12 @@ The following table describes the options: | `-t` | Allocates a pseudo-tty (a shell). | | `--name` | Identifies the container. If you do not use this option to identify a name for the container, the daemon generates a container ID for you. When you use this option to identify a container name, you can use the name to reference the container within a Docker network in foreground or detached mode. | | `-p` | The TCP port for the Drill Web UI. If needed, you can change this port using the `drill.exec.http.port` [start-up option]({{site.baseurl}}/docs/start-up-options/). | -| `drill/apache-drill:<version>` | The Docker Hub repository and tag. In the following example, `drill/apache-drill` is the repository and `1.15.0` is the tag: `drill/apache-drill:1.16.0` The tag correlates with the version of Drill. When a new version of Drill is available, you can use the new version as the tag. | +| `drill/apache-drill:<version>` | The Docker Hub repository and tag. In the following example, `drill/apache-drill` is the repository and `1.16.0` is the tag: `drill/apache-drill:1.16.0` The tag correlates with the version of Drill. When a new version of Drill is available, you can use the new version as the tag. | | `bin/bash` | Connects to the Drill container using a bash shell. | ### Running the Drill Docker Container in Foreground Mode -Open a terminal window (Command Prompt or PowerShell, but not PowerShell ISE) and then issue the following command and opitons to connect to SQLLine (the Drill shell): +Open a terminal window (Command Prompt or PowerShell, but not PowerShell ISE) and then issue the following command and options to connect to SQLLine (the Drill shell): docker run -i --name drill-1.16.0 -p 8047:8047 -t drill/apache-drill:1.16.0 /bin/bash @@ -43,7 +43,7 @@ When you issue the docker run command, the Drill process starts in a container. Jun 29, 2018 3:28:21 AM org.glassfish.jersey.server.ApplicationHandler initialize INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26... - apache drill 1.15.0 + apache drill 1.16.0 "json ain't no thang" 0: jdbc:drill:zk=local> @@ -67,7 +67,7 @@ Open a terminal window (Command Prompt or PowerShell, but not PowerShell ISE) an After you issue the commands, the Drill process starts in a container. SQLLine prints a message, and the prompt appears: - apache drill 1.15.0 + apache drill 1.16.0 "json ain't no thang" 0: jdbc:drill:drillbit=localhost> diff --git a/_docs/rn/005-1.16.0-rn.md b/_docs/rn/005-1.16.0-rn.md index 4f04272..ac07ab7 100644 --- a/_docs/rn/005-1.16.0-rn.md +++ b/_docs/rn/005-1.16.0-rn.md @@ -20,7 +20,9 @@ This release of Drill provides the following new features and improvements: - [Format plugin for LTSV files]({{site.baseurl}}/docs/ltsv-format-plugin/) ([DRILL-7014](https://issues.apache.org/jira/browse/DRILL-7014)) - Ability to query Hive views, like querying Hive tables in a hive schema, for example `SELECT * FROM hive.`hive_view`; ([DRILL-540](https://issues.apache.org/jira/browse/DRILL-540)) - [Upgrade to SQLLine 1.7]({{site.baseurl}}/docs/configuring-the-drill-shell/) changes the default prompt to `apache drill (schema_name)>` or you can define a custom prompt using the command `!set prompt <new-prompt>`. ([DRILL-6989](https://issues.apache.org/jira/browse/DRILL-6989)) -- Calcite updated to version 1.18.0 ([DRILL-6862](https://issues.apache.org/jira/browse/DRILL-6862)) +- Calcite updated to version 1.18.0 ([DRILL-6862](https://issues.apache.org/jira/browse/DRILL-6862)) +- A new maprdb format plugin option, `readTimestampWithZoneOffset`, converts timestamp values from UTC to local time zone when values are read from MapR Database. This option is disabled by default. ([DRILL-6969](https://issues.apache.org/jira/browse/DRILL-6969)) +- A new Drill configuration option, `store.hive.maprdb_json.read_timestamp_with_timezone_offset`, enables Drill to read timestamp values with a timezone offset when using the hive plugin with the Drill native MaprDB JSON reader enabled. This option is disabled by default. ([DRILL-6969](https://issues.apache.org/jira/browse/DRILL-6969)) - Several Drill Web UI improvements, including: - [Storage plugin management improvements](https://drill.apache.org/docs/configuring-storage-plugins/#exporting-storage-plugin-configurations) ([DRILL-6562](https://issues.apache.org/jira/browse/DRILL-6562)) - [Query progress indicators and warnings ]({{site.baseurl}}/docs/query-profiles/#query-profile-warnings) ([DRILL-6879](https://issues.apache.org/jira/browse/DRILL-6879)) diff --git a/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md b/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md index 3e71ebc..ebfbb28 100644 --- a/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md +++ b/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md @@ -1,6 +1,6 @@ --- title: "REFRESH TABLE METADATA" -date: 2019-04-30 +date: 2019-05-24 parent: "SQL Commands" --- Run the REFRESH TABLE METADATA command on Parquet tables and directories to generate a metadata cache file. REFRESH TABLE METADATA collects metadata from the footers of Parquet files and writes the metadata to a metadata file (`.drill.parquet_file_metadata.v4`) and a summary file (`.drill.parquet_summary_metadata.v4`). The planner uses the metadata cache file to prune extraneous data during the query planning phase. Run the REFRESH TABLE METADATA command if planning time is a significant [...] @@ -69,7 +69,10 @@ Enables filter pushdown optimization for Parquet files. Drill reads the file met Sets the number of row groups that a table can have. You can increase the threshold if the filter can prune many row groups. However, if this setting is too high, the filter evaluation overhead increases. Base this setting on the data set. Reduce this setting if the planning time is significant or you do not see any benefit at runtime. Default is 10000. (Drill 1.9+) ## Limitations -Currently, Drill does not support runtime rowgroup pruning. + + +- Drill does not support runtime rowgroup pruning. +- REFRESH TABLE METADATA does not count null values for decimal, varchar, and interval data types. ## Examples diff --git a/_docs/sql-reference/sql-functions/020-data-type-conversion.md b/_docs/sql-reference/sql-functions/020-data-type-conversion.md index 675ebc3..589f1bc 100644 --- a/_docs/sql-reference/sql-functions/020-data-type-conversion.md +++ b/_docs/sql-reference/sql-functions/020-data-type-conversion.md @@ -1,6 +1,6 @@ --- title: "Data Type Conversion" -date: 2019-02-19 +date: 2019-05-24 parent: "SQL Functions" --- Drill supports the following functions for casting and converting data types: @@ -10,7 +10,7 @@ Drill supports the following functions for casting and converting data types: * [STRING_BINARY]({{ site.baseurl }}/docs/data-type-conversion/#string_binary-function) and [BINARY_STRING]({{ site.baseurl }}/docs/data-type-conversion/#binary_string-function) * [Other Data Type Conversions]({{ site.baseurl }}/docs/data-type-conversion/#other-data-type-conversions) -**Note:** Starting in Drill 1.15, all cast and data type conversion functions return null for an empty string ('') when the `drill.exec.functions.cast_empty_string_to_null` option is enabled, for example: +Starting in Drill 1.15, all cast and data type conversion functions return null for an empty string ('') when the `drill.exec.functions.cast_empty_string_to_null` option is enabled, for example: SELECT CAST('' AS DATE), TO_TIMESTAMP('', 'yyyy-MM-dd HH:mm:ss') FROM (VALUES(2)); +---------+---------+ @@ -897,10 +897,50 @@ Convert a UTC date to a timestamp offset from the UTC time zone code. +------------------------+---------+ | 2015-03-30 20:49:00.0 | UTC | +------------------------+---------+ - 1 row selected (0.148 seconds) + 1 row selected (0.148 seconds) -## Time Zone Limitation -Currently Drill does not support conversion of a date, time, or timestamp from one time zone to another. Queries of data associated with a time zone can return inconsistent results or an error. For more information, see the ["Understanding Drill's Timestamp and Timezone"](http://www.openkb.info/2015/05/understanding-drills-timestamp-and.html#.VUzhotpVhHw) blog. The Drill time zone is based on the operating system time zone unless you override it. To work around the limitation, configure [...] +## Enabling Time Zone Offset + +Starting in Drill 1.16, the `store.hive.maprdb_json.read_timestamp_with_timezone_offset` option enables Drill to read timestamp values with a timezone offset when using the hive plugin with the Drill native MaprDB JSON reader enabled through the `store.hive.maprdb_json.optimize_scan_with_native_reader option`. The `store.hive.maprdb_json.read_timestamp_with_timezone_offset` option is disabled (set to 'false') by default. You can enable this option from the Options page in the Drill Web [...] + +**Important** +Internally, Drill stores timestamp values in UTC format, for example 2018-01-01T20:12:12.123Z. When you enable the timezone offset option, select on a table returns different timestamp values. If you filter on timestamp values when this option is enabled, you must include the new timestamp value in the filter condition. + +For example, look at the timestamp values when the `store.hive.maprdb_json.read_timestamp_with_timezone_offset` option is disabled (set to 'false'): + + + select * from dfs.`/tmp/timestamp`; + ------------------------------------------------------- + _id datestring datetimestamp + ------------------------------------------------------- + 1 2018-01-01 12:12:12.123 2018-01-01 20:12:12.123 + 2 9999-12-31 23:59:59.999 10000-01-01 07:59:59.999 + ------------------------------------------------------- + +When the option is enabled (set to 'true'), you can see the difference in the timestamp values returned: + + select * from dfs.`/tmp/timestamp`; + ------------------------------------------------------ + _id datestring datetimestamp + ------------------------------------------------------ + 1 2018-01-01 12:12:12.123 2018-01-01 12:12:12.123 + 2 9999-12-31 23:59:59.999 9999-12-31 23:59:59.999 + ------------------------------------------------------ + +When the option is enabled, queries that filter on timestamp values must include the new timestamp value in the filter condition, as shown: + + select * from dfs.`/tmp/timestamp` where datetimestamp=timestamp '2018-01-01 12:12:12.123'; + ------------------------------------------------------ + _id datestring datetimestamp + ------------------------------------------------------ + 1 2018-01-01 12:12:12.123 2018-01-01 12:12:12.123 + ------------------------------------------------------ + +Notice that the WHERE clause uses the `2018-01-01 12:12:12.123` format versus the `2018-01-01 20:12:12.123` format. + +## Time Zone Limitation + +Drill does not support conversion of a date, time, or timestamp from one time zone to another. Queries of data associated with a time zone can return inconsistent results or an error. For more information, see the ["Understanding Drill's Timestamp and Timezone"](http://www.openkb.info/2015/05/understanding-drills-timestamp-and.html#.VUzhotpVhHw) blog. The Drill time zone is based on the operating system time zone unless you override it. To work around the limitation, configure Drill to u [...] 1. Take a look at the Drill time zone configuration by running the TIMEOFDAY function or by querying the system.options table. This TIMEOFDAY function returns the local date and time with time zone information. @@ -941,7 +981,9 @@ You can use the āzā option to identify the time zone in TO_TIMESTAMP to make +------------------------+-----------+ | 2015-03-30 20:49:00.0 | UTC | +------------------------+-----------+ - 1 row selected (0.097 seconds) + 1 row selected (0.097 seconds) + + <!-- DRILL-448 Support timestamp with time zone -->