rdblue commented on code in PR #6811:
URL: https://github.com/apache/iceberg/pull/6811#discussion_r1103856648
##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -885,7 +921,7 @@ private Builder(TableMetadata base) {
this.sortOrders = Lists.newArrayList(base.sortOrders);
this.properties = Maps.newHashMap(base.properties);
this.currentSnapshotId = base.currentSnapshotId;
- this.snapshots = Lists.newArrayList(base.snapshots);
+ this.snapshots = Lists.newArrayList(base.snapshots());
Review Comment:
Is this needed?
It will always load snapshots when building a new table metadata and that
doesn't seem right. I think it would be correct to do it that way, but you
don't really need to load them as long as the `snapshotsSupplier` is passed
here as well. If all of the snapshots are needed to write the new metadata to a
file, the metadata parser will call `snapshots()` at which point they will be
loaded.
The only problem I can think of is that we're replacing the snapshot list
completely in `ensureSnapshotsLoaded` and we would need to update that to
basically union the initially loaded snapshots with the new ones that are
before the sequence number. That's a bit more complex and we could do that
later if we wanted to.
I think the argument for lazily loading snapshots is that this is in the
critical path during a commit, so hitting a REST endpoint is going to increase
the chances of conflicts. On the other hand, this is a simpler way to make
snapshot loading lazy for all read cases.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]