rdblue commented on a change in pull request #1490: URL: https://github.com/apache/iceberg/pull/1490#discussion_r495177531
########## File path: site/docs/hive.md ########## @@ -0,0 +1,62 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Hive + +## Hive read support +Iceberg supports the reading of Iceberg tables from [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers). + +### Table creation +This section explains the various steps needed in order to overlay a Hive table "on top of" an existing Iceberg table. + +#### Create an Iceberg table +The first step is to create an Iceberg table using the Spark/Java/Python API. For the purposes of this documentation we will assume that the table is called `table_a` and that the base location of the table is `s3://some_bucket/some_path/table_a`. + +#### Add the Iceberg Hive Runtime jar file to the Hive classpath +The `HiveIcebergStorageHandler` and supporting classes need to be made available on Hive's classpath. For example, if using Hive 2.x and the Hive shell, this can be achieved by issuing a statement like so: +```sql +add jar /path/to/iceberg-hive-runtime.jar; +``` +There are many others ways to achieve this including adding the jar file to Hive's auxillary classpath (so it is available by default) - please refer to Hive's documentation for more information. + +#### Create a Hive table +Now overlay a Hive table on top of this Iceberg table by issuing Hive DDL like so: +```sql +CREATE EXTERNAL TABLE table_a +STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' +LOCATION 's3://some_bucket/some_path/table_a'; +``` + +#### Query the Iceberg table via Hive +You should now be able to issue Hive SQL `SELECT` queries using the above table and see the results returned from the underlying Iceberg table. Both the Map Reduce and Tez query execution engines are supported. +```sql +SELECT * from table_a; +``` + +### Features + +#### Predicate pushdown +Pushdown of the Hive SQL `WHERE` clause has been implemented so that these filters are used at the Iceberg TableScan level as well as by the Parquet and ORC Readers. + +#### Column selection +The projection of columns from the HiveSQL `SELECT` clause down to the Iceberg readers to reduce the number of columns read is currently being worked on. + +### Time travel and system tables +Support for accesing Iceberg's time travel feature and other system tables isn't currently supported but there is a plan to add this in the future. Review comment: Yeah, that's the table I was thinking of. It's easier to look at a table than to find a section of documentation that says something isn't supported. I'm also fine with dropping the sections. We can always add a table later. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org