[PR] Spark streaming rate limit [fluss]

via GitHub Mon, 02 Mar 2026 08:12:55 -0800


addu390 opened a new pull request, #2776:
URL: https://github.com/apache/fluss/pull/2776


   ### Purpose
   
   Linked issue: close #2550
   
   Add rate limit support for Spark streaming reads to control the number of 
offsets processed per micro-batch trigger.
   
   ### Brief change log
   
   - Added `scan.max.offsets.per.trigger`, `scan.min.offsets.per.trigger`, and 
`scan.max.trigger.delay` config options in `SparkFlussConf`
   - Override `getDefaultReadLimit` in `FlussMicroBatchStream` to return 
appropriate `ReadLimit` based on config
   - Note: Offset capping uses proportional fair-share distribution across 
buckets. A simpler, more typical approach (`maxOffsets / numBuckets`) can be 
used instead, if that's preferred.
   
   ### Tests
   
   - `SparkStreamingTest#read: log table with maxOffsetsPerTrigger rate limit`
   
   ### API and Format
   
   New user-facing config options for Spark DataFrameReader:
   - `scan.max.offsets.per.trigger`
   - `scan.min.offsets.per.trigger`
   - `scan.max.trigger.delay`
   
   ### Documentation
   
   N/A, documentation update to be tracked separately.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Spark streaming rate limit [fluss]

Reply via email to