flyrain commented on code in PR #11612:
URL: https://github.com/apache/iceberg/pull/11612#discussion_r1901376613
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -560,20 +560,15 @@ public Scan buildChangelogScan() {
}
boolean emptyScan = false;
- if (startTimestamp != null) {
- if (table.currentSnapshot() != null
- && table.currentSnapshot().timestampMillis() < startTimestamp) {
- emptyScan = true;
+ if (startTimestamp != null || endTimestamp != null) {
+ if (startTimestamp != null) {
+ startSnapshotId = getStartSnapshotId(startTimestamp);
}
- startSnapshotId = getStartSnapshotId(startTimestamp);
- if (startSnapshotId == null && endTimestamp == null) {
- emptyScan = true;
+ if (endTimestamp != null) {
+ endSnapshotId = getEndSnapshotId(startTimestamp, endTimestamp);
}
- }
-
- if (endTimestamp != null) {
- endSnapshotId = getEndSnapshotId(endTimestamp);
if ((startSnapshotId == null && endSnapshotId == null)
+ || (endSnapshotId == null && endTimestamp != null)
|| (startSnapshotId != null &&
startSnapshotId.equals(endSnapshotId))) {
emptyScan = true;
}
Review Comment:
I believe this part of logic should be out side of line 563 ` if
(startTimestamp != null || endTimestamp != null) `, as we are not only check
the when timestamp is one of the inputs, we also need to check when inputs are
start snapshot id and end snapshot id.
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -615,10 +609,14 @@ private Long getStartSnapshotId(Long startTimestamp) {
}
}
- private Long getEndSnapshotId(Long endTimestamp) {
+ private Long getEndSnapshotId(Long startTimestamp, Long endTimestamp) {
Long endSnapshotId = null;
for (Snapshot snapshot : SnapshotUtil.currentAncestors(table)) {
- if (snapshot.timestampMillis() <= endTimestamp) {
+ long ts = snapshot.timestampMillis();
+ if (startTimestamp != null && ts <= startTimestamp) {
+ break;
+ }
Review Comment:
I found a bit more confusing by moving the logic here. We are checking if
start timestamp is larger than the current snapshot's one. Checking within an
ancestor loop yields the same result, but seems unnecessary. Plus, it has to
introduces the line 571, which is another burden for reader to understand.
```
|| (endSnapshotId == null && endTimestamp != null)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]