massdosage commented on a change in pull request #1748:
URL: https://github.com/apache/iceberg/pull/1748#discussion_r520528074
##########
File path: site/docs/hive.md
##########
@@ -50,6 +50,19 @@ You should now be able to issue Hive SQL `SELECT` queries
using the above table
SELECT * from table_a;
```
+#### Using Hive Catalog
+Iceberg tables created using `HiveCatalog` are automatically registered with
Hive.
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API
and `HiveCatalog`. For the purposes of this documentation we will assume that
the table is called `table_b` and that the table location is
`s3://some_path/table_b`.
+TODO: what do we need to set up when we create this table programatically for
everything to be registered correctly for read usage in Hive?
Review comment:
@pvary What does one need to do in order to get the table set up
properly for the Hive read path in this case? What I have tried to do so far is
this, first create an Iceberg table using the HiveCatalog like so:
```
PartitionSpec spec = PartitionSpec.unpartitioned();
Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
optional(2, "name", Types.StringType.get()));
SparkSession spark =
SparkSession.builder().appName("IcebergTest").getOrCreate();
Catalog catalog = new HiveCatalog(hadoopConfiguration);
TableIdentifier tableId = TableIdentifier.of("test",
"iceberg_table_from_hive_catalog");
catalog.createTable(tableId, schema, spec);
```
The table created in Hive by the above has DDL like so:
```
CREATE EXTERNAL TABLE `iceberg_table_from_hive_catalog`(
`id` bigint COMMENT '',
`name` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.FileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.mapred.FileOutputFormat'
LOCATION
's3://REDACTED/iceberg_table_from_hive_catalog'
TBLPROPERTIES (
'metadata_location'='s3://REDACTED/iceberg_table_from_hive_catalog/metadata/00000-7addbbf2-1836-4973-86af-0511ae7577fb.metadata.json',
'table_type'='ICEBERG',
'transient_lastDdlTime'='1605007216')
```
Which is obviously incorrect as the StorageHandler hasn't been set etc.. I
know you worked on a PR that set this all up properly as long as some
config/setup was performed at table creation time. Can you please let me know
what I need to do and I'll then document it accordingly once I test it working?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]