Re: Migrating plain parquet tables to iceberg

Kruger, Scott Tue, 03 Nov 2020 11:52:24 -0800

Awesome, this is working for us, although we had to modify our code to also use 
the NameMapping when grabbing parquet file metrics. Thanks!

From: Ryan Blue <[email protected]>
Reply-To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Date: Friday, October 30, 2020 at 5:55 PM
To: "[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: Migrating plain parquet tables to iceberg

This message is from an external sender.

For existing tables that use name-based column resolution, you can add a 
name-to-id mapping that is applied when reading files with no field IDs. There 
is a utility to generate the name mapping from an existing schema (using the 
current names) and then you just need to store that in a table property.

NameMapping mapping = MappingUtil.create(table.schema());

table.updateProperties()

    .set("schema.name-mapping.default", NameMappingParser.toJson(mapping))

    .commit()

I think there is also an issue to add a name mapping by default when importing 
data.

On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <[email protected]> 
wrote:
I’m looking to migrate a partitioned parquet table to use iceberg. The issue 
I’ve run into is that the column order for the data varies wildly, which isn’t 
a problem for us normally (we just set mergeSchemas=true when reading), but 
presents a problem with iceberg because the iceberg.schema field isn’t set in 
the parquet footer. Is there any way to migrate this data over without 
rewriting the entire dataset?

--
Ryan Blue
Software Engineer
Netflix

Re: Migrating plain parquet tables to iceberg

Reply via email to