marton-bod commented on a change in pull request #2544: URL: https://github.com/apache/iceberg/pull/2544#discussion_r627425991
########## File path: site/docs/hive.md ########## @@ -17,117 +17,324 @@ # Hive -## Hive read support -Iceberg supports the reading of Iceberg tables from [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers). Please note that only Hive 2.x versions are currently supported. +Iceberg supports reading and writing Iceberg tables through [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers). +Here is the current compatibility matrix for Iceberg Hive support: -### Table creation -This section explains the various steps needed in order to overlay a Hive table "on top of" an existing Iceberg table. Iceberg tables are created using either a [`Catalog`](./javadoc/master/index.html?org/apache/iceberg/catalog/Catalog.html) or an implementation of the [`Tables`](./javadoc/master/index.html?org/apache/iceberg/Tables.html) interface and Hive needs to be configured accordingly to read data from these different types of table. +| Feature | Hive 2.x | Hive 3.1.2 | +| ------------------------ | ---------------------- | ---------------------- | +| CREATE EXTERNAL TABLE | ✔️ | ✔️ | +| CREATE TABLE | ✔️ | ✔️ | +| DROP TABLE | ✔️ | ✔️ | +| SELECT | ✔️ (MapReduce and Tez) | ✔️ (MapReduce and Tez) | +| INSERT INTO | ✔️ (MapReduce only)️ | ✔️ (MapReduce only) | -#### Add the Iceberg Hive Runtime jar file to the Hive classpath -Regardless of the table type, the `HiveIcebergStorageHandler` and supporting classes need to be made available on Hive's classpath. These are provided by the `iceberg-hive-runtime` jar file. For example, if using the Hive shell, this can be achieved by issuing a statement like so: -```sql +## Enabling Iceberg support in Hive + +### Loading runtime jar + +To enable Iceberg support in Hive, the `HiveIcebergStorageHandler` and supporting classes need to be made available on Hive's classpath. +These are provided by the `iceberg-hive-runtime` jar file. +For example, if using the Hive shell, this can be achieved by issuing a statement like so: + +``` add jar /path/to/iceberg-hive-runtime.jar; ``` -There are many others ways to achieve this including adding the jar file to Hive's auxiliary classpath (so it is available by default) - please refer to Hive's documentation for more information. -#### Using Hadoop Tables -Iceberg tables created using `HadoopTables` are stored entirely in a directory in a filesystem like HDFS. +There are many others ways to achieve this including adding the jar file to Hive's auxiliary classpath so it is available by default. +Please refer to Hive's documentation for more information. -##### Create an Iceberg table -The first step is to create an Iceberg table using the Spark/Java/Python API and `HadoopTables`. For the purposes of this documentation we will assume that the table is called `table_a` and that the table location is `hdfs://some_path/table_a`. +### Enabling support -##### Create a Hive table -Now overlay a Hive table on top of this Iceberg table by issuing Hive DDL like so: -```sql -CREATE EXTERNAL TABLE table_a -STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' -LOCATION 'hdfs://some_bucket/some_path/table_a'; +If the Iceberg storage handler is not in Hive's classpath, then Hive cannot load or update the metadata for an Iceberg table when the storage handler is set. +To avoid the appearance of broken tables in Hive, Iceberg will not add the storage handler to a table unless Hive support is enabled. +The storage handler is kept in sync (added or removed) every time a table is updated. Review comment: nit: just as a quick clarification, the storage handler is added/removed only if the `engine.hive.enabled` table property changes from true to false (or vice versa), not just for any table updates. So maybe we can reword it a bit: > The storage handler is kept in sync (added or removed) every time Hive engine support for the table is updated, i.e. turned on or off in the table properties. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org