Mryange opened a new pull request, #61858:
URL: https://github.com/apache/doris/pull/61858

   
   ### What problem does this PR solve?
   
   `predicate_creator.h` was a header-only file containing two heavy template 
functions: `create_in_list_predicate<PT>` and 
`create_comparison_predicate<PT>`. Every `.cpp` file that included this header 
had to independently instantiate massive template class hierarchies:
   
   - **InListPredicateBase<TYPE, PT, N>**: 23 types × 2 PT × 9 N = **414 class 
instantiations**, each with ~34 member functions (676 LOC class)
   - **ComparisonPredicateBase<TYPE, PT>**: 23 types × 6 PT = **132 class 
instantiations**, each with ~40 member functions (720 LOC class)
   
   This resulted in ~19,000 function instantiations **per consumer file**. The 
top 2 slowest files in the entire BE codebase (`scan_operator.cpp` at 143.5s 
and `delete_handler.cpp` at 141.0s) both included this header — confirmed via 
`-ftime-trace` profiling.
   
   ### What this PR does
   
   1. **Move template definitions from header to `.cpp`**: Replace the full 
template function bodies in `predicate_creator.h` with declarations only. Add 
explicit template instantiations in `.cpp` so the templates are compiled once 
and linked.
   
   2. **Prune heavy includes from the header**: Remove `in_list_predicate.h` 
(676 LOC), `comparison_predicate.h` (720 LOC), `bloom_filter_predicate.h`, 
`null_predicate.h`, and other transitive includes that are no longer needed in 
the header. Add forward declarations for `BloomFilterFuncBase` and 
`BitmapFilterFuncBase`.
   
   3. **Split into 4 `.cpp` files for parallel compilation**: The concentrated 
template instantiations in a single `.cpp` would create a new 224s bottleneck. 
Split by template family to enable parallel builds:
      - `predicate_creator.cpp` — bloom_filter + bitmap_filter (lightweight, 
~28s)
      - `predicate_creator_in_list_in.cpp` — 
`create_in_list_predicate<IN_LIST>` (~61s)
      - `predicate_creator_in_list_not_in.cpp` — 
`create_in_list_predicate<NOT_IN_LIST>` (~60s)
      - `predicate_creator_comparison.cpp` — 
`create_comparison_predicate<EQ/NE/LT/GT/LE/GE>` (~46s)
   
   4. **Fix broken transitive includes**: Add `#include 
"storage/predicate/null_predicate.h"` to `delete_handler.cpp` which previously 
got it transitively through the old header.
   
   ### Compilation time results (ASAN, single-threaded measurement)
   
   | File | Before (s) | After (s) | Change |
   |------|--------:|--------:|--------|
   | scan_operator.cpp | 143.50 | 45.96 | **-68.0%** |
   | delete_handler.cpp | 141.03 | 29.62 | **-79.0%** |
   | predicate_creator.cpp | 33.05 | 28.13 | -14.9% |
   | predicate_creator_in_list_in.cpp | — | 60.85 | new |
   | predicate_creator_in_list_not_in.cpp | — | 59.84 | new |
   | predicate_creator_comparison.cpp | — | 46.23 | new |
   
   **Parallel build critical path: 143.5s → 60.9s (-57.5%)**
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to