I thought that we had already updated the metrics code to use a name
mapping. Sorry I was mistaken. Could you post a PR with your fix?

Glad it's working!

On Tue, Nov 3, 2020 at 11:51 AM Kruger, Scott <[email protected]> wrote:

> Awesome, this is working for us, although we had to modify our code to
> also use the NameMapping when grabbing parquet file metrics. Thanks!
>
>
>
> *From: *Ryan Blue <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>, "
> [email protected]" <[email protected]>
> *Date: *Friday, October 30, 2020 at 5:55 PM
> *To: *"[email protected]" <[email protected]>
> *Cc: *"[email protected]" <[email protected]>
> *Subject: *Re: Migrating plain parquet tables to iceberg
>
>
>
> This message is from an external sender.
>
> For existing tables that use name-based column resolution, you can add a
> name-to-id mapping that is applied when reading files with no field IDs.
> There is a utility to generate the name mapping from an existing schema
> (using the current names) and then you just need to store that in a table
> property.
>
> NameMapping mapping = MappingUtil.create(table.schema());
>
> table.updateProperties()
>
>     .set("schema.name-mapping.default", NameMappingParser.toJson(mapping))
>
>     .commit()
>
> I think there is also an issue to add a name mapping by default when
> importing data.
>
>
>
> On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <[email protected]>
> wrote:
>
> I’m looking to migrate a partitioned parquet table to use iceberg. The
> issue I’ve run into is that the column order for the data varies wildly,
> which isn’t a problem for us normally (we just set mergeSchemas=true when
> reading), but presents a problem with iceberg because the iceberg.schema
> field isn’t set in the parquet footer. Is there any way to migrate this
> data over without rewriting the entire dataset?
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to