freemandealer opened a new pull request, #61518:
URL: https://github.com/apache/doris/pull/61518
MOTIVATION
While internal table/partition info can be infered from tabletId, external
table scans, where file cache entries cannot be traced back to the original
table/partition in a consistent way. This change propagates file cache context
from FE to BE, persists it in RocksDB using a deduplicated dictionary design,
and exposes it through FILE_CACHE_INFO.
FE DESIGN
- Extend thrift scan descriptors with optional file cache context fields:
- TFileRangeDesc.table_name / partition_name
- TPaloScanRange.partition_name
- TOlapScanNode.partition_name
- Centralize external table context filling in FileScanNode:
- use table.getNameWithFullQualifiers() as the stable table identifier
- build partition_name from partition path key/value pairs in a shared
helper
- reuse the helper from FileQueryScanNode, FileGroupInfo, and
NereidsFileGroupInfo
- Propagate internal table context from OlapScanNode:
- set table_name on both scan node thrift and per-tablet scan ranges
- set partition_name when the scan targets a single partition
- keep partition_name empty for multi-partition scans
- Extend the FILE_CACHE_INFO schema with TABLE_NAME and PARTITION_NAME
columns
BE DESIGN
- Extend io::IOContext and file cache context objects with table_name and
partition_name, so the information is available on all file-cache-aware read
paths
- Propagate context through:
- FileScanner for external table scans
- OlapScanner / TabletReader / RowsetReaderContext for internal table
scans
- the parallel internal scan path, so context is preserved even when
scanners are split by tablet or segment
- Make FileScanner derive partition_name from columns_from_path when FE
does not send an explicit partition string, and clear stale range context when
a later range omits table metadata
- Persist block metadata in RocksDB by storing only a context_id in
CacheBlockMetaPb instead of duplicating table/partition strings per block
- Add a dedicated RocksDB column family as a dictionary:
- (table_name, partition_name) -> context_id
- context_id -> (table_name, partition_name)
- Resolve context_id back to strings only when serving FILE_CACHE_INFO, so
the normal data read path stays lightweight
- Update non-scan IOContext aggregate initializers to explicitly keep
empty table/partition context, preserving existing behavior while keeping the
build clean under strict initializer warnings
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]