stevenzwu commented on code in PR #4329:
URL: https://github.com/apache/iceberg/pull/4329#discussion_r844261439


##########
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/source/ScanContext.java:
##########
@@ -114,77 +128,104 @@ private ScanContext(boolean caseSensitive, Long 
snapshotId, Long startSnapshotId
     this.includeColumnStats = includeColumnStats;
     this.exposeLocality = exposeLocality;
     this.planParallelism = planParallelism;
+
+    validate();
+  }
+
+  private void validate() {
+    if (isStreaming) {
+      if (startingStrategy == 
StreamingStartingStrategy.SPECIFIC_START_SNAPSHOT_ID) {

Review Comment:
   Will the `startSnapshotId` or `startSnapshotTimestamp`, there is no regular 
table scan. Say they map to some snapshot id X. With the latest agreement on 
inclusive behavior for the starting snapshot, we will start the incremental 
scan btw (X's parent snapshot id, Y]. Previous/current implementation has the 
start snapshot as exclusive, which is the same behavior as the current source 
impl.
   
   Only `TABLE_SCAN_THEN_INCREMENTAL` does a regular table scan and then start 
the incremental discovery process.
   
   right now, I didn't add extensive/strict validations. We can't do some of 
the validations. E.g., `endSnapshotId` will be added in the copied 
`ScanContext`. Validation on `endSnapshotId` will fail. I guess we can fix the 
copy method and unset the streaming related fields like `isStreaming`.
   ```
   ScanContext incrementalScan = scanContext
             .copyWithAppendsBetween(lastPosition.endSnapshotId(), 
currentSnapshot.snapshotId());
   ```
   
   Regarding the end condition, it might be useful for a future 
`endSnapshotTimestamp`. A past `endSnapshotTimestamp` probably doesn't make 
sense, since we can just do a regular batch scan with filters. 
   
   A `endSnapshotId` in the future also doesn't make sense, since we won't know 
it when the streaming read starts. A  `endSnapshotId` in the past should be 
just a regular batch scan and hence is not applicable to streaming mode.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to