RjLi13 commented on code in PR #15670:
URL: https://github.com/apache/iceberg/pull/15670#discussion_r3016830641
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java:
##########
@@ -90,10 +89,13 @@ class AsyncSparkMicroBatchPlanner extends
BaseSparkMicroBatchPlanner implements
this.minQueuedRows = readConf().maxRecordsPerMicroBatch();
this.lastOffsetForTriggerAvailableNow = lastOffsetForTriggerAvailableNow;
this.planFilesCache =
Caffeine.newBuilder().maximumSize(PLAN_FILES_CACHE_MAX_SIZE).build();
- this.queue = new LinkedBlockingQueue<>();
+ this.queue = new LinkedBlockingDeque<>();
table().refresh();
- // Synchronously add data to the queue to meet our initial constraints
+
+ // Synchronously add data to the queue to meet our initial constraints.
+ // For Trigger.AvailableNow, constructor-time preload is normally
initialized from
+ // latestOffset(...) with no explicit end offset, so bounded preload must
stop at the cap.
Review Comment:
The cap here refers to the AvailableNowTrigger limit, which I understand
prevents reading beyond what's available now, even if later there's more data.
I would check to make sure initial preloading doesn't cross that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]