Re: Integrating Existing Iceberg Tables with a Metastore

John Clara Thu, 19 Nov 2020 12:29:54 -0800

Hi,

My team has been using the custom catalog along with atomic metadataupdates but we never migrated existing iceberg tables onto it. We alsohaven't turned on integration with the hive catalog, so I'm not sure how

easy it is to plug in there (I think there was some recent work onthat?). Dynamo provides a local mock which you could combine with s3mock(check iceberg tests) to test it out:https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html


Only weird things we've run into with dynamo is:

1. it seems like we get rate limited by dynamo pretty hard when firstwriting to a new table until rate limits are adjusted (potentially byaws dynamically dynamo's internal partitions?)2. make sure to page scans if you have a lot of values when doing lists(we haven't enabled catalog listing yet, but we've ran into this before)

We chose dynamo because we were using it for other usecases. I'm notsure if it's the best aws provided option for atomic changes.


John

On 11/19/20 10:07 AM, Marko Babic wrote:

Hi everyone,
At my org we’ve spun up a few Iceberg tables on top of S3 without ametastore (conscious of the consequences) and we’ve arrived at thepoint that we need to support concurrent writes. :) I was hoping toget some advice as to what the best way to integrate an existingIceberg table into a Hive Metastore or an alternative might be. We’restill relatively early in our adoption of Iceberg and have no realprior experience with Hive so I don’t know what I don’t know.
Some options we’re weighing:
- Existing tables aren’t so big that the moral equivalent of "CREATETABLE hive.db.table … AS SELECT * FROM hadoop.table" is out of thequestion, but we’d prefer to not have to read + rewrite everything. Wealso have stateful readers (tracking which snapshots they havepreviously read) and preserving table history would make life easier.
- Doing something along the lines of the following and importing thetables into Hive as external tables looks like it should work given myunderstanding of how Iceberg is using HMS, but I don’t know if it’sencouraged and I haven’t done diligence to understand potentialconsequences:
```
hive> CREATE EXTERNAL TABLE `existing_table` (...)
LOCATION
  's3://existing-table/'
-- serde, input/output formats omitted
TBLPROPERTIES (
-- Assuming latest metadata file for Hadoop table isv99.metadata.json, rename it to 00099-uuid.metadata.json -- so that BaseMetastoreTableOperations can correctly parse theversion number.
'metadata_location'='s3://existing-table/metadata/00099-uuid.metadata.json',
  'table_type'='ICEBERG'
)
```
- Others seem to have had success implementing + maintaining acustom catalog (https://iceberg.apache.org/custom-catalog/<https://iceberg.apache.org/custom-catalog/>) backed by e.g. DynamoDBfor atomic metadata updates, which could appeal to us. Seems likemigration in this case consists of implementing the catalog andplopping the latest metadata into the backing store. Are customcatalogs more of an escape hatch when HMS can’t be used, or would thatmaybe be a reasonable way forward if we find we don’t want to maintain+ operate on top of HMS?
Apologies if this was discussed or documented somewhere else and I’vemissed it.
Thanks!

Marko

Re: Integrating Existing Iceberg Tables with a Metastore

Reply via email to