[GitHub] [iceberg] rdblue commented on a change in pull request #1490: Hive read docs

GitBox Fri, 25 Sep 2020 12:02:42 -0700


rdblue commented on a change in pull request #1490:
URL: https://github.com/apache/iceberg/pull/1490#discussion_r495177531




##########
File path: site/docs/hive.md
##########
@@ -0,0 +1,62 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Hive
+
+## Hive read support
+Iceberg supports the reading of Iceberg tables from 
[Hive](https://hive.apache.org) by using a 
[StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
 
+
+### Table creation
+This section explains the various steps needed in order to overlay a Hive 
table "on top of" an existing Iceberg table.
+
+#### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API. 
For the purposes of this documentation we will assume that the table is called 
`table_a` and that the base location of the table is 
`s3://some_bucket/some_path/table_a`.
+
+#### Add the Iceberg Hive Runtime jar file to the Hive classpath
+The `HiveIcebergStorageHandler` and supporting classes need to be made 
available on Hive's classpath. For example, if using Hive 2.x and the Hive 
shell, this can be achieved by issuing a statement like so:
+```sql
+add jar /path/to/iceberg-hive-runtime.jar;
+```
+There are many others ways to achieve this including adding the jar file to 
Hive's auxillary classpath (so it is available by default) - please refer to 
Hive's documentation for more information.
+
+#### Create a Hive table
+Now overlay a Hive table on top of this Iceberg table by issuing Hive DDL like 
so:
+```sql
+CREATE EXTERNAL TABLE table_a 
+STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
+LOCATION 's3://some_bucket/some_path/table_a';
+```
+
+#### Query the Iceberg table via Hive
+You should now be able to issue Hive SQL `SELECT` queries using the above 
table and see the results returned from the underlying Iceberg table. Both the 
Map Reduce and Tez query execution engines are supported.
+```sql
+SELECT * from table_a;
+```
+
+### Features
+
+#### Predicate pushdown
+Pushdown of the Hive SQL `WHERE` clause has been implemented so that these 
filters are used at the Iceberg TableScan level as well as by the Parquet and 
ORC Readers.
+
+#### Column selection
+The projection of columns from the HiveSQL `SELECT` clause down to the Iceberg 
readers to reduce the number of columns read is currently being worked on.
+
+### Time travel and system tables
+Support for accesing Iceberg's time travel feature and other system tables 
isn't currently supported but there is a plan to add this in the future.

Review comment:
       Yeah, that's the table I was thinking of. It's easier to look at a table 
than to find a section of documentation that says something isn't supported. 
I'm also fine with dropping the sections. We can always add a table later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1490: Hive read docs

Reply via email to