Re: [PR] [feat](storage) Implement adaptive batch size for SegmentIterator [doris]

via GitHub Wed, 01 Apr 2026 04:31:57 -0700


github-actions[bot] commented on code in PR #61535:
URL: https://github.com/apache/doris/pull/61535#discussion_r3021487433



##########
gensrc/thrift/PaloInternalService.thrift:
##########
@@ -470,6 +470,16 @@ struct TQueryOptions {
   // session variable `spill_repartition_max_depth` in FE. Default is 8.
   209: optional i32 spill_repartition_max_depth = 8
 
+  // Adaptive batch size: target output block size in bytes. 0 means disabled 
(fixed row count only).
+  // Default 8MB. Sent by FE session variable preferred_block_size_bytes.
+  210: optional i64 preferred_block_size_bytes = 8388608
+
+  // Per-column byte limit for adaptive batch size. 0 means no per-column 
limit. Default 1MB.
+  211: optional i64 preferred_max_column_in_block_size_bytes = 1048576
+
+  // Adaptive batch size: target output block row count limit. Default 65535.
+  // Sent by FE session variable preferred_block_size_rows.
+  212: optional i64 preferred_block_size_rows = 65535
 
   210: optional double max_scan_mem_ratio = 0.3;
   211: optional bool enable_adaptive_scan = false;

Review Comment:
   `TQueryOptions` already uses field ids `210` and `211` immediately below for 
`max_scan_mem_ratio` / `enable_adaptive_scan`, so reusing `210`/`211`/`212` 
here breaks thrift compatibility instead of extending it.
   
   Concrete failure mode: FE now writes `preferred_block_size_bytes` at id 210, 
but an older BE will deserialize id 210 as `max_scan_mem_ratio`; similarly id 
211 will be read as `enable_adaptive_scan`. Even on same-version builds, 
duplicate field ids in one thrift struct are unsafe and codegen-dependent.
   
   Please move the new adaptive-batch fields to fresh ids greater than the 
current maximum and keep the existing ids stable.



##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -913,6 +914,25 @@ Status 
VCollectIterator::Level1Iterator::_merge_next(Block* block) {
             continuous_row_in_block = 0;
             pre_row_ref = cur_row;
         }
+
+        // Byte-budget check: _merge_next() has already advanced _ref to the 
next unread row,
+        // so it is safe to stop here without duplicating any data.
+        if (config::enable_adaptive_batch_size && 
_reader->preferred_block_size_bytes() > 0 &&
+            Block::columns_byte_size(target_columns) >= 
_reader->preferred_block_size_bytes()) {
+            if (continuous_row_in_block > 0) {

Review Comment:
   This byte-budget check runs **before** the pending `continuous_row_in_block` 
slice is flushed into `target_columns`, so on the merge path it measures only a 
partial block.
   
   Concrete scenario: if a child block contributes a long contiguous run (for 
example a wide string column from overlapping rowsets), `target_columns` can 
still be below the threshold here while `continuous_row_in_block` already holds 
many unread bytes. The code then flushes that run only after the check and can 
return a block far above `preferred_block_size_bytes`.
   
   Because the PR description promises that `VCollectIterator` also bounds the 
final output block size, I think the budget should be checked **after** 
accounting for the buffered run (or the buffered bytes should be included in 
the estimate) before deciding whether to stop. Otherwise overlapping-rowset 
reads keep the old oversized-block behavior on this path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feat](storage) Implement adaptive batch size for SegmentIterator [doris]

Reply via email to