Re: Migrating plain parquet tables to iceberg

Ryan Blue Fri, 30 Oct 2020 15:56:38 -0700

For existing tables that use name-based column resolution, you can add a
name-to-id mapping that is applied when reading files with no field IDs.
There is a utility to generate the name mapping from an existing schema
(using the current names) and then you just need to store that in a table
property.


NameMapping mapping = MappingUtil.create(table.schema());
table.updateProperties()
    .set("schema.name-mapping.default", NameMappingParser.toJson(mapping))
    .commit()

I think there is also an issue to add a name mapping by default when
importing data.

On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <[email protected]>
wrote:

> I’m looking to migrate a partitioned parquet table to use iceberg. The
> issue I’ve run into is that the column order for the data varies wildly,
> which isn’t a problem for us normally (we just set mergeSchemas=true when
> reading), but presents a problem with iceberg because the iceberg.schema
> field isn’t set in the parquet footer. Is there any way to migrate this
> data over without rewriting the entire dataset?
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Migrating plain parquet tables to iceberg

Reply via email to