mrhhsg opened a new pull request, #63296: URL: https://github.com/apache/doris/pull/63296
## Summary Fix scanner scheduler block merging so the adaptive batch size byte budget is respected when multiple scanned blocks are stitched into a cached block. ## Root Cause The scheduler merge path only checked the row count against `batch_size()`. When adaptive batch size produced multiple blocks that were individually acceptable, the scheduler could still merge them into a much larger block because it ignored `preferred_block_size_bytes()`. ## Changes - Capture `preferred_block_size_bytes()` for the scan task. - Merge into the last cached block only when both the row budget and byte budget are satisfied. - Keep empty-block merge behavior unchanged so eos/filtered-empty blocks are not emitted separately. - Preserve `allocated_bytes()` for memory accounting while using `bytes()` for the adaptive data-size budget. ## Validation - `git diff --check -- be/src/exec/scan/scanner_scheduler.cpp` - `ninja -C be/ut_build_ASAN src/exec/CMakeFiles/Exec.dir/scan/scanner_scheduler.cpp.o` Note: `./run-be-ut.sh --run --filter=ScannerContextTest.*` was started earlier but stopped after it triggered a broad ASAN UT build; the changed object had already compiled successfully. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
