xr-chen commented on code in PR #766:
URL: https://github.com/apache/incubator-xtable/pull/766#discussion_r2639025635
##########
xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java:
##########
@@ -161,12 +169,42 @@ private void initializeTableIfRequired(InternalTable
internalTable) {
}
}
+ private MappedFields updateNameMapping(MappedFields mapping, Map<Integer,
String> updates) {
+ if (mapping == null) {
+ return null;
+ }
+ List<MappedField> fieldResults = new ArrayList<>();
+ for (MappedField field : mapping.fields()) {
+ Set<String> fieldNames = new HashSet<>(field.names());
+ if (updates.containsKey(field.id())) {
+ fieldNames.add(updates.get(field.id()));
+ }
+ MappedFields nestedMapping = updateNameMapping(field.nestedMapping(),
updates);
+ fieldResults.add(MappedField.of(field.id(), fieldNames, nestedMapping));
+ }
+ return MappedFields.of(fieldResults);
+ }
+
@Override
public void syncSchema(InternalSchema schema) {
Schema latestSchema = schemaExtractor.toIceberg(schema);
+ String mappingJson =
transaction.table().properties().get(TableProperties.DEFAULT_NAME_MAPPING);
+ boolean hasFieldIds =
+ schema.getAllFields().stream().anyMatch(field -> field.getFieldId() !=
null);
+ // Recreate name mapping when field IDs were provided in the source schema
to ensure every
+ // field in the mapping was assigned the same ID as what is in the source
schema
+ NameMapping mapping =
+ mappingJson == null || hasFieldIds
+ ? MappingUtil.create(latestSchema)
+ : NameMappingParser.fromJson(mappingJson);
Review Comment:
Just tested, Iceberg will automatically update the name mapping as well when
updating the schema with `UpdateSchema`, for example, when a new column `city`
is added, the corresponding entry can also be found in the name mapping, schema
```
{
"id" : 19,
"name" : "street",
"required" : false,
"type" : "string"
}, {
"id" : 29,
"name" : "city",
"required" : false,
"type" : "string"
}
```
name mapping
```
{
"field-id" : 19,
"names" : [ "street" ]
}, {
"field-id" : 29,
"names" : [ "city" ]
}
```
It's unnecessary to recreate the name mapping based on the latest schema in
this case, and recreating might result in the name mapping not aligning with
the table schema. For example, the field Id of `city` is supposed to be 20 in
the `latestSchema`, if no field Ids are provided in the source table, the name
mapping created from `latestSchema` will also map 20 to `city`, but in the
table schema, 20 is already assigned to one of the nested fields which is of
course not the field `city`.
And when the schema is updated with the operations API, the name mapping
will be recreated based on the latest schema to reflect the schema change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]