For existing tables that use name-based column resolution, you can add a
name-to-id mapping that is applied when reading files with no field IDs.
There is a utility to generate the name mapping from an existing schema
(using the current names) and then you just need to store that in a table
property.
NameMapping mapping = MappingUtil.create(table.schema());
table.updateProperties()
.set("schema.name-mapping.default", NameMappingParser.toJson(mapping))
.commit()
I think there is also an issue to add a name mapping by default when
importing data.
On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <[email protected]>
wrote:
> I’m looking to migrate a partitioned parquet table to use iceberg. The
> issue I’ve run into is that the column order for the data varies wildly,
> which isn’t a problem for us normally (we just set mergeSchemas=true when
> reading), but presents a problem with iceberg because the iceberg.schema
> field isn’t set in the parquet footer. Is there any way to migrate this
> data over without rewriting the entire dataset?
>
--
Ryan Blue
Software Engineer
Netflix