[GitHub] [iceberg] xiacongling commented on a diff in pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

GitBox Fri, 12 Aug 2022 00:47:54 -0700


xiacongling commented on code in PR #3852:
URL: https://github.com/apache/iceberg/pull/3852#discussion_r944201015



##########
core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java:
##########
@@ -295,7 +298,25 @@ public static Schema schemaFor(Table table, long 
snapshotId) {
       return schema;
     }
 
-    // TODO: recover the schema by reading previous metadata files
+    // Otherwise, read each of the previous metadata files until we find one 
whose current
+    // snapshot id is the snapshot id
+    if (table instanceof BaseTable) {
+      Schema schema = null;
+      TableMetadata current = ((BaseTable) table).operations().current();
+      for (TableMetadata.MetadataLogEntry logEntry : current.previousFiles()) {
+        String metadataFile = logEntry.file();
+        TableMetadata metadata = TableMetadataParser.read(table.io(), 
metadataFile);
+        if (metadata.currentSnapshot() != null &&
+            metadata.currentSnapshot().snapshotId() == snapshotId) {
+          schema = metadata.schema();
+          break;
+        }
+      }
+      Preconditions.checkArgument(schema != null,
+          "Cannot find a metadata file corresponding to the snapshot %s", 
snapshotId);

Review Comment:
   2 questions:
   - Iterating all metadata files seems time consuming. It may hurt query 
users. Can we add a switch to help skip the recovery?
   - it may throws Exception at the end, which may cause the invoker to fail. 
Can we keep the original behavior and return the current schema?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] xiacongling commented on a diff in pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

Reply via email to