amogh-jahagirdar commented on code in PR #5364:
URL: https://github.com/apache/iceberg/pull/5364#discussion_r930657809
##########
core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java:
##########
@@ -322,6 +323,39 @@ public static long snapshotIdAsOfTime(Table table, long
timestampMillis) {
return snapshotId;
}
+ /**
+ * Returns the ID of the most recent snapshot on the given branch
+ * as of the given time in milliseconds
+ *
+ * @param table a {@link Table}
+ * @param branch a {@link String}
+ * @param timestampMillis the timestamp in millis since the Unix epoch
+ * @return the snapshot ID
+ * @throws IllegalArgumentException when no snapshot is found in the table,
on the given branch
+ * older than the timestamp
+ */
+ public static long snapshotIdAsOfTime(Table table, String branch, long
timestampMillis) {
+ SnapshotRef ref = table.refs().get(branch);
+ Preconditions.checkArgument(ref != null, "Branch %s does not exist",
branch);
+ Preconditions.checkArgument(ref.isBranch(), "Ref %s is a tag, not a
branch", branch);
+ Long snapshotId = null;
+ long minimumTimeDifference = Long.MAX_VALUE;
+ for (Snapshot snapshot : ancestorsOf(ref.snapshotId(), table::snapshot)) {
+ if (snapshot.timestampMillis() <= timestampMillis) {
+ if (timestampMillis - snapshot.timestampMillis() <=
minimumTimeDifference) {
+ minimumTimeDifference = timestampMillis - snapshot.timestampMillis();
+ snapshotId = snapshot.snapshotId();
+ }
+ }
Review Comment:
I think I was overthinking this. Lets take a simple case:
staged snapshots should not be considered when doing time travel on a
branch.
```
S4 (staged at ts-400)
/
S1 (committed at ts-100) - S2 (committed at ts-200) - S3 (committed at
ts-300)
```
Now we do the following: createBranch("newBranch", S4)
The graph looks like:
```
S4 (staged at ts-400, tip of newBranch)
/
S1 (committed at ts-100) - S2 (committed at ts-200) - S3 (committed at
ts-300)
```
Originally I was thinking that snapshotIdAsOfTime(table, newBranch, 500)
should return S1 because S4 is staged. But snapshot 4 is part of the branch
state! So we should consider that when doing time travel.
Now, let's take this case where 4 is actually committed on newBranch.
```
S4 (committed at ts-400, tip of newBranch)
/
S1 (committed at ts-100) - S2 (committed at ts-200) - S3 (committed at
ts-300)
```
Now we want to stage a snapshot 5 onto newBranch.
```
S4 (committed at ts-400, tip of newBranch) - S5 (staged at 500)
/
S1 (committed at ts-100) - S2 (committed at ts-200) - S3 (committed at
ts-300)
```
When we produce the staged snapshot onto a branch we will make sure the
ancestor is set correctly for S5,
but the ref pointer should not be advanced to the staged commit. In this
case snapshotIdAsOfTime(table, newBranch, 500) will return S4 and not S5. So
in this case we are good; we'll need to
ensure this in our snapshot producer changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]