szehon-ho commented on code in PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046518537
##########
core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java:
##########
@@ -149,8 +149,7 @@ public static Iterable<Snapshot> ancestorsOf(long
snapshotId, Function<Long, Sna
}
/**
- * Traverses the history of the table's current snapshot and finds the first
snapshot committed
- * after the given time.
+ * Finds the oldest snapshot that was committed either at or after a given
time.
Review Comment:
Nit: is it necessary to change whole sentence? How about just replacing
'after' to 'at or after'?
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -213,13 +214,26 @@ public Scan build() {
SparkReadOptions.END_SNAPSHOT_ID,
SparkReadOptions.START_SNAPSHOT_ID);
+ checkInvalidOptions();
+
if (startSnapshotId != null) {
return buildIncrementalAppendScan(startSnapshotId, endSnapshotId);
} else {
return buildBatchScan(snapshotId, asOfTimestamp, branch, tag);
}
}
+ private void checkInvalidOptions() {
+ Long startTimestamp = readConf.startTimestamp();
Review Comment:
Nit: maybe just inline this if its only used in one place (unless I miss
something)? Doesnt make sense to make a helper method called
'checkInvalidOption' but only add one specific check.
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -308,6 +339,17 @@ public Scan buildChangelogScan() {
return new SparkChangelogScan(spark, table, scan, readConf,
expectedSchema, filterExpressions);
}
+ private Long getStartSnapshotId(Long startTimestamp) {
+ Snapshot oldestSnapshotAfter = SnapshotUtil.oldestAncestorAfter(table,
startTimestamp);
+ Preconditions.checkArgument(
Review Comment:
Having an exception to distinguish is a fair point. Initially I was
thinking it would be a pain for user to have to deal with exception if they
wanted to do all changes from a random time, but they can find the earliest
snapshot and make sure the timestamp is appropriate to avoid exception. So
still no real preference, ok either way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]