I thought that we had already updated the metrics code to use a name mapping. Sorry I was mistaken. Could you post a PR with your fix?
Glad it's working! On Tue, Nov 3, 2020 at 11:51 AM Kruger, Scott <[email protected]> wrote: > Awesome, this is working for us, although we had to modify our code to > also use the NameMapping when grabbing parquet file metrics. Thanks! > > > > *From: *Ryan Blue <[email protected]> > *Reply-To: *"[email protected]" <[email protected]>, " > [email protected]" <[email protected]> > *Date: *Friday, October 30, 2020 at 5:55 PM > *To: *"[email protected]" <[email protected]> > *Cc: *"[email protected]" <[email protected]> > *Subject: *Re: Migrating plain parquet tables to iceberg > > > > This message is from an external sender. > > For existing tables that use name-based column resolution, you can add a > name-to-id mapping that is applied when reading files with no field IDs. > There is a utility to generate the name mapping from an existing schema > (using the current names) and then you just need to store that in a table > property. > > NameMapping mapping = MappingUtil.create(table.schema()); > > table.updateProperties() > > .set("schema.name-mapping.default", NameMappingParser.toJson(mapping)) > > .commit() > > I think there is also an issue to add a name mapping by default when > importing data. > > > > On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <[email protected]> > wrote: > > I’m looking to migrate a partitioned parquet table to use iceberg. The > issue I’ve run into is that the column order for the data varies wildly, > which isn’t a problem for us normally (we just set mergeSchemas=true when > reading), but presents a problem with iceberg because the iceberg.schema > field isn’t set in the parquet footer. Is there any way to migrate this > data over without rewriting the entire dataset? > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > -- Ryan Blue Software Engineer Netflix
