Re: [PR] [bug](iceberg) fix can't get migrated Iceberg tables format type [doris]

via GitHub Thu, 04 Jun 2026 23:20:59 -0700


github-actions[bot] commented on code in PR #64134:
URL: https://github.com/apache/doris/pull/64134#discussion_r3360776696



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java:
##########
@@ -1098,6 +1095,44 @@ public static FileFormat getFileFormat(Table 
icebergTable) {
         return fileFormat;
     }
 
+    private static String resolveFileFormatName(Table icebergTable, 
Map<String, String> properties) {
+        // 1. Check "write-format" (nickname in Flink and Spark)
+        if (properties.containsKey(WRITE_FORMAT)) {
+            return properties.get(WRITE_FORMAT);
+        }
+        // 2. Check "write.format.default" (standard Iceberg property)
+        if (properties.containsKey(TableProperties.DEFAULT_FILE_FORMAT)) {
+            return properties.get(TableProperties.DEFAULT_FILE_FORMAT);
+        }
+        // 3. Check "format" property (e.g., "iceberg/parquet", "iceberg/orc")
+        //    This is commonly set on migrated Iceberg tables.
+        if (properties.containsKey(FORMAT)) {
+            return properties.get(FORMAT);
+        }
+        // 4. Last resort: infer from the actual data files in the current 
snapshot.
+        //    This handles migrated tables where none of the above properties 
are set.
+        return inferFileFormatFromDataFiles(icebergTable);
+    }
+
+    private static String inferFileFormatFromDataFiles(Table icebergTable) {
+        if (icebergTable.currentSnapshot() == null) {
+            LOG.info("Iceberg table {} has no snapshot, defaulting to {}", 
icebergTable.name(), PARQUET_NAME);
+            return PARQUET_NAME;
+        }
+        try (CloseableIterable<FileScanTask> files = 
icebergTable.newScan().planFiles()) {
+            java.util.Iterator<FileScanTask> it = files.iterator();
+            if (it.hasNext()) {
+                String format = it.next().file().format().name().toLowerCase();
+                LOG.info("Iceberg table {} inferred file format {} from data 
files", icebergTable.name(), format);
+                return format;
+            }
+        } catch (Exception e) {
+            LOG.warn("Failed to infer file format from data files for table 
{}, defaulting to {}",
+                    icebergTable.name(), PARQUET_NAME, e);
+        }
+        return PARQUET_NAME;
+    }
+
 

Review Comment:
   This catch-all fallback reintroduces the same wrong-format behavior when 
inference fails. For a migrated ORC table that lacks 
`write-format`/`write.format.default`/`format`, any `planFiles()` failure here 
(for example manifest IO/auth/catalog errors) is logged and converted to 
`PARQUET_NAME`, so scans can still plan ORC files as Parquet and 
writes/deletes/merge paths that call `getFileFormat()` can choose the wrong 
file format. Per Doris error-handling rules, this should fail with table 
context instead of silently defaulting; only the no-snapshot/no-files case 
should use the explicit default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [bug](iceberg) fix can't get migrated Iceberg tables format type [doris]

Reply via email to