Re: [PR] Spark 4.1: Fix async microbatch plan bugs [iceberg]

via GitHub Tue, 31 Mar 2026 09:13:20 -0700


RjLi13 commented on code in PR #15670:
URL: https://github.com/apache/iceberg/pull/15670#discussion_r3016830641



##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java:
##########
@@ -90,10 +89,13 @@ class AsyncSparkMicroBatchPlanner extends 
BaseSparkMicroBatchPlanner implements
     this.minQueuedRows = readConf().maxRecordsPerMicroBatch();
     this.lastOffsetForTriggerAvailableNow = lastOffsetForTriggerAvailableNow;
     this.planFilesCache = 
Caffeine.newBuilder().maximumSize(PLAN_FILES_CACHE_MAX_SIZE).build();
-    this.queue = new LinkedBlockingQueue<>();
+    this.queue = new LinkedBlockingDeque<>();
 
     table().refresh();
-    // Synchronously add data to the queue to meet our initial constraints
+
+    // Synchronously add data to the queue to meet our initial constraints.
+    // For Trigger.AvailableNow, constructor-time preload is normally 
initialized from
+    // latestOffset(...) with no explicit end offset, so bounded preload must 
stop at the cap.

Review Comment:
   The cap here refers to the AvailableNowTrigger limit, which I understand 
prevents reading beyond what's available now, even if later there's more data. 
I would check to make sure initial preloading doesn't cross that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark 4.1: Fix async microbatch plan bugs [iceberg]

Reply via email to