airborne12 opened a new pull request, #60787:
URL: https://github.com/apache/doris/pull/60787
### What problem does this PR solve?
Problem Summary:
The `search()` function bypasses `InvertedIndexSearcherCache`, opening
Lucene IndexReader directly per field per segment. This causes redundant file
I/O and misses cache benefits that other index readers enjoy. Additionally,
identical DSL queries against the same segment re-execute Lucene matching from
scratch.
This PR adds two caching layers:
**Phase 1 - Searcher Cache Reuse**: `FieldReaderResolver::resolve()` now
looks up `InvertedIndexSearcherCache` before opening index files. On cache hit,
it extracts `IndexReader*` from the cached `IndexSearcher` via `getReader()`.
On miss, it builds the searcher, inserts into cache, then extracts the reader.
Cache handles are held in `_searcher_cache_handles` vector to keep entries
pinned during query execution.
**Phase 2 - DSL Result Cache**: A new `SearchFunctionQueryCache` (LRU, keyed
by `segment_prefix + "#" + dsl_signature`) caches the final `roaring::Roaring`
bitmap result. Repeated identical DSL queries against the same segment skip
Lucene execution entirely. Controlled by:
- `enable_search_function_query_cache` session variable (default: `true`)
- `search_function_query_cache_limit` BE config (default: `"10%"`)
### Release note
Add searcher cache reuse and DSL-level result cache for the search()
function to reduce redundant Lucene I/O and improve query performance.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. search() function now caches IndexSearcher handles and DSL
query results. Cache can be disabled via `set
enable_search_function_query_cache = false`.
- Does this need documentation?
- [ ] No.
- [x] Yes. New session variable `enable_search_function_query_cache` and
BE config `search_function_query_cache_limit`.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]