github-actions[bot] commented on code in PR #63296:
URL: https://github.com/apache/doris/pull/63296#discussion_r3247197780


##########
be/src/exec/scan/scanner_scheduler.cpp:
##########
@@ -272,9 +274,30 @@ void 
ScannerScheduler::_scanner_scan(std::shared_ptr<ScannerContext> ctx,
                 // Projection will truncate useless columns, makes block size 
change.
                 auto free_block_bytes = free_block->allocated_bytes();
                 raw_bytes_read += free_block_bytes;
-                if (!scan_task->cached_blocks.empty() &&
-                    scan_task->cached_blocks.back().first->rows() + 
free_block->rows() <=
-                            ctx->batch_size()) {
+                const auto can_merge_to_last_block = [&]() {
+                    if (scan_task->cached_blocks.empty()) {
+                        return false;
+                    }
+
+                    const auto* last_block = 
scan_task->cached_blocks.back().first.get();
+                    if (last_block->rows() == 0 || free_block->rows() == 0) {
+                        return true;
+                    }
+
+                    const bool within_row_budget =
+                            last_block->rows() + free_block->rows() <= 
ctx->batch_size();
+                    if (!within_row_budget) {
+                        return false;
+                    }
+
+                    const auto free_block_data_bytes = free_block->bytes();
+                    const bool within_byte_budget = preferred_block_size_bytes 
== 0 ||
+                                                    last_block->bytes() + 
free_block_data_bytes <=

Review Comment:
   This treats `preferred_block_size_bytes == 0` as the disabled case, but 
`RuntimeState::preferred_block_size_bytes()` never returns `0` when adaptive 
batch sizing is disabled; it returns `kMax` (512MB) and the accessor comment 
says callers that need the active/disabled state must check 
`config::enable_adaptive_batch_size` explicitly. As a result, with 
`enable_adaptive_batch_size=false`, this PR still stops merging once the 
combined data bytes exceed 512MB, whereas the old behavior was row-budget-only. 
That is a behavior regression for disabled adaptive mode and can produce extra 
cached blocks / earlier scan-task yielding for wide rows. Please gate the 
byte-budget check on `config::enable_adaptive_batch_size` (or otherwise pass a 
disabled byte budget) instead of relying on `0` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to