amogh-jahagirdar commented on code in PR #5364:
URL: https://github.com/apache/iceberg/pull/5364#discussion_r930621186
##########
core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java:
##########
@@ -322,6 +323,39 @@ public static long snapshotIdAsOfTime(Table table, long
timestampMillis) {
return snapshotId;
}
+ /**
+ * Returns the ID of the most recent snapshot on the given branch
+ * as of the given time in milliseconds
+ *
+ * @param table a {@link Table}
+ * @param branch a {@link String}
+ * @param timestampMillis the timestamp in millis since the Unix epoch
+ * @return the snapshot ID
+ * @throws IllegalArgumentException when no snapshot is found in the table,
on the given branch
+ * older than the timestamp
+ */
+ public static long snapshotIdAsOfTime(Table table, String branch, long
timestampMillis) {
+ SnapshotRef ref = table.refs().get(branch);
+ Preconditions.checkArgument(ref != null, "Branch %s does not exist",
branch);
+ Preconditions.checkArgument(ref.isBranch(), "Ref %s is a tag, not a
branch", branch);
+ Long snapshotId = null;
+ long minimumTimeDifference = Long.MAX_VALUE;
+ for (Snapshot snapshot : ancestorsOf(ref.snapshotId(), table::snapshot)) {
+ if (snapshot.timestampMillis() <= timestampMillis) {
+ if (timestampMillis - snapshot.timestampMillis() <=
minimumTimeDifference) {
+ minimumTimeDifference = timestampMillis - snapshot.timestampMillis();
+ snapshotId = snapshot.snapshotId();
+ }
+ }
Review Comment:
This does not seem right in the presence of WAP. If we have a staged commit
on a branch then we should not include that in time travel; but with traversing
through ancestorsOf we would get that. The reason why time travel relies on the
snapshot log is because the snapshot log is the source of truth for how the
table (main) state evolved over time. Looking through ancestors and comparing
timestamps would count staged commits which we don't want (we should only count
snapshots which are part of that branch's state).
This may involve having to maintain some kind of extra metadata like instead
of just a single snapshot log for the main table state, there is a map<string,
list<historyEntry>> where the key is the ref and the value is the list of logs.
When a snapshot is produced on a branch, and is not staged then we would update
this metadata.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]