[GitHub] [iceberg] rdblue commented on a diff in pull request #6811: Allow lazy loading of snapshots in TableMetadata

via GitHub Sun, 12 Feb 2023 10:45:31 -0800


rdblue commented on code in PR #6811:
URL: https://github.com/apache/iceberg/pull/6811#discussion_r1103856648



##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -885,7 +921,7 @@ private Builder(TableMetadata base) {
       this.sortOrders = Lists.newArrayList(base.sortOrders);
       this.properties = Maps.newHashMap(base.properties);
       this.currentSnapshotId = base.currentSnapshotId;
-      this.snapshots = Lists.newArrayList(base.snapshots);
+      this.snapshots = Lists.newArrayList(base.snapshots());

Review Comment:
   Is this needed?
   
   It will always load snapshots when building a new table metadata and that 
doesn't seem right. I think it would be correct to do it that way, but you 
don't really need to load them as long as the `snapshotsSupplier` is passed 
here as well. If all of the snapshots are needed to write the new metadata to a 
file, the metadata parser will call `snapshots()` at which point they will be 
loaded.
   
   The only problem I can think of is that we're replacing the snapshot list 
completely in `ensureSnapshotsLoaded` and we would need to update that to 
basically union the initially loaded snapshots with the new ones that are 
before the sequence number. That's a bit more complex and we could do that 
later if we wanted to.
   
   I think the argument for lazily loading snapshots is that this is in the 
critical path during a commit, so hitting a REST endpoint is going to increase 
the chances of conflicts. On the other hand, this is a simpler way to make 
snapshot loading lazy for all read cases.
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6811: Allow lazy loading of snapshots in TableMetadata

Reply via email to