Re: [PR] Fix Delta to Iceberg not working on column mapping enabled Delta source table [incubator-xtable]

via GitHub Sun, 21 Dec 2025 17:31:47 -0800


xr-chen commented on code in PR #766:
URL: https://github.com/apache/incubator-xtable/pull/766#discussion_r2638323810



##########
xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java:
##########
@@ -161,12 +169,42 @@ private void initializeTableIfRequired(InternalTable 
internalTable) {
     }
   }
 
+  private MappedFields updateNameMapping(MappedFields mapping, Map<Integer, 
String> updates) {
+    if (mapping == null) {
+      return null;
+    }
+    List<MappedField> fieldResults = new ArrayList<>();
+    for (MappedField field : mapping.fields()) {
+      Set<String> fieldNames = new HashSet<>(field.names());
+      if (updates.containsKey(field.id())) {
+        fieldNames.add(updates.get(field.id()));
+      }
+      MappedFields nestedMapping = updateNameMapping(field.nestedMapping(), 
updates);
+      fieldResults.add(MappedField.of(field.id(), fieldNames, nestedMapping));
+    }
+    return MappedFields.of(fieldResults);
+  }
+
   @Override
   public void syncSchema(InternalSchema schema) {
     Schema latestSchema = schemaExtractor.toIceberg(schema);
+    String mappingJson = 
transaction.table().properties().get(TableProperties.DEFAULT_NAME_MAPPING);
+    boolean hasFieldIds =
+        schema.getAllFields().stream().anyMatch(field -> field.getFieldId() != 
null);
+    // Recreate name mapping when field IDs were provided in the source schema 
to ensure every
+    // field in the mapping was assigned the same ID as what is in the source 
schema
+    NameMapping mapping =
+        mappingJson == null || hasFieldIds
+            ? MappingUtil.create(latestSchema)
+            : NameMappingParser.fromJson(mappingJson);

Review Comment:
   I think it depends on whether the source table has field IDs in the schema 
and how we update the schema. For a source table that doesn't have field IDs in 
its schema, if I understand the code correctly, the current implementation will 
make updates using the schema update API. For example, a table without field 
IDs that has three columns of int, array, and string type, the generated 
Iceberg schema will look like
   ```
   1 int
   2 array
     4 - elements
   3 string
   ```
   If we add a double column to this table, with the schema update API, the 
next available integer will be used for the new column
   ```
   1 int
   2 array
     4 - elements
   3 string
   5 double
   ```
   But if we recreate mapping from the converted schema with auto-assigned IDs 
in this case, the structure of the mapping will be:
   ```
   1 int
   2 array
    5 - elements
   3 string
   4 double
   ```
   which doesn't align with the schema anymore, so I think we should only 
create the mapping using the latest schema, only if the source table uses field 
IDs to identify columns in the data file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix Delta to Iceberg not working on column mapping enabled Delta source table [incubator-xtable]

Reply via email to