xr-chen commented on code in PR #766:
URL: https://github.com/apache/incubator-xtable/pull/766#discussion_r2638323810
##########
xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java:
##########
@@ -161,12 +169,42 @@ private void initializeTableIfRequired(InternalTable
internalTable) {
}
}
+ private MappedFields updateNameMapping(MappedFields mapping, Map<Integer,
String> updates) {
+ if (mapping == null) {
+ return null;
+ }
+ List<MappedField> fieldResults = new ArrayList<>();
+ for (MappedField field : mapping.fields()) {
+ Set<String> fieldNames = new HashSet<>(field.names());
+ if (updates.containsKey(field.id())) {
+ fieldNames.add(updates.get(field.id()));
+ }
+ MappedFields nestedMapping = updateNameMapping(field.nestedMapping(),
updates);
+ fieldResults.add(MappedField.of(field.id(), fieldNames, nestedMapping));
+ }
+ return MappedFields.of(fieldResults);
+ }
+
@Override
public void syncSchema(InternalSchema schema) {
Schema latestSchema = schemaExtractor.toIceberg(schema);
+ String mappingJson =
transaction.table().properties().get(TableProperties.DEFAULT_NAME_MAPPING);
+ boolean hasFieldIds =
+ schema.getAllFields().stream().anyMatch(field -> field.getFieldId() !=
null);
+ // Recreate name mapping when field IDs were provided in the source schema
to ensure every
+ // field in the mapping was assigned the same ID as what is in the source
schema
+ NameMapping mapping =
+ mappingJson == null || hasFieldIds
+ ? MappingUtil.create(latestSchema)
+ : NameMappingParser.fromJson(mappingJson);
Review Comment:
I think it depends on whether the source table has field IDs in the schema
and how we update the schema. For a source table that doesn't have field IDs in
its schema, if I understand the code correctly, the current implementation will
make updates using the schema update API. For example, a table without field
IDs that has three columns of int, array, and string type, the generated
Iceberg schema will look like
```
1 int
2 array
4 - elements
3 string
```
If we add a double column to this table, with the schema update API, the
next available integer will be used for the new column
```
1 int
2 array
4 - elements
3 string
5 double
```
But if we recreate mapping from the converted schema with auto-assigned IDs
in this case, the structure of the mapping will be:
```
1 int
2 array
5 - elements
3 string
4 double
```
which doesn't align with the schema anymore, so I think we should only
create the mapping using the latest schema, only if the source table uses field
IDs to identify columns in the data file
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]