mrhhsg opened a new pull request, #63296:
URL: https://github.com/apache/doris/pull/63296

   ## Summary
   
   Fix scanner scheduler block merging so the adaptive batch size byte budget 
is respected when multiple scanned blocks are stitched into a cached block.
   
   ## Root Cause
   
   The scheduler merge path only checked the row count against `batch_size()`. 
When adaptive batch size produced multiple blocks that were individually 
acceptable, the scheduler could still merge them into a much larger block 
because it ignored `preferred_block_size_bytes()`.
   
   ## Changes
   
   - Capture `preferred_block_size_bytes()` for the scan task.
   - Merge into the last cached block only when both the row budget and byte 
budget are satisfied.
   - Keep empty-block merge behavior unchanged so eos/filtered-empty blocks are 
not emitted separately.
   - Preserve `allocated_bytes()` for memory accounting while using `bytes()` 
for the adaptive data-size budget.
   
   ## Validation
   
   - `git diff --check -- be/src/exec/scan/scanner_scheduler.cpp`
   - `ninja -C be/ut_build_ASAN 
src/exec/CMakeFiles/Exec.dir/scan/scanner_scheduler.cpp.o`
   
   Note: `./run-be-ut.sh --run --filter=ScannerContextTest.*` was started 
earlier but stopped after it triggered a broad ASAN UT build; the changed 
object had already compiled successfully.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to