[
https://issues.apache.org/jira/browse/KYLIN-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930189#comment-17930189
]
Guoliang Sun commented on KYLIN-6055:
-------------------------------------
h1. Dev Design
Scope Clarification: With multi-level partitioning enabled, only support
`dimension_range_info_map` at the first-level time partition column (used for
query filtering via min/max data). True multi-level partition column support is
not included.
Limitation: Cases involving deleted partitions are not considered.
h2. Original Design
1. First-Level Time Partition Column Calculation Logic:
- Build: Calculate min/max directly from flat table data (see
`SegmentExec#calDimRange`).
- Merge: Merge logic in `MergeStage#mergeDimRange`.
- If flat table data exists, calculate min/max directly.
- If no flat table data exists, check `dimension_range_info_map` in
Segments:
- If any unmerged Segment lacks `dimension_range_info_map`, skip
calculation (no min/max info means longer query time but no data loss).
- If all unmerged Segments have `dimension_range_info_map`, extract
min/max for each column and apply data type-specific logic.
- Metadata Update: Replace old Segments with new ones during updates; no
extra `dimension_range_info_map` update required (see
`AfterMergeOrRefreshResourceMerger#merge`).
- For Segment Partition updates, aggregate related data (e.g.,
`column_source_bytes`) and set updates (see
`MetadataMerger#upsertSegmentPartition`). Multi-level partition column
scenarios were unsupported, so no `dimension_range_info_map` updates were
performed.
h2. New Design
1. New Parameter: `kylin.build.multi-partition-filter-enabled` (default:
`false`, feature disabled).
2. Build:
- Retain original logic but remove the
`Objects.isNull(segment.getModel.getMultiPartitionDesc)` check to enable
calculations even with multi-level partitions.
- Subclasses reuse the logic, then perform the first merge with the
`dimension_range_info_map` stored in the Segment. During merge, take the
smallest and largest values, comparing them based on column data types (columns
must be defined in the model; otherwise, skip comparison to avoid incorrect
filtering). Missing min/max may reduce filtering but won’t cause data loss.
3. Merge:
- No concept of partition merging; only Segment-level merging exists. Reuse
Segment merge logic to calculate min/max for `dimension_range_info_map` columns
in the merged Segment and write to the JSON file. No additional logic is needed
in `PartitionMergeStage`.
4. Metadata Update:
- Extra setting and updating of `dimension_range_info_map` is required.
Considering multiple subtasks (e.g., multiple partitions modifying the
`dimension_range_info_map` in the Segment JSON file), retrieve the previously
stored `dimension_range_info_map` and perform a second merge. This merge
operation locks the Segment JSON file path, preventing other tasks from
reading/writing until the update is complete.
- Once `dimension_range_info_map` is written to the Segment’s JSON file,
queries can filter based on column min/max info as before—no special changes
are needed.
> Empty `dimension_range_info_map` After Building Model with Secondary
> Partition Causes Query to Fail Filtering by Dimension
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-6055
> URL: https://issues.apache.org/jira/browse/KYLIN-6055
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0.0
> Reporter: Guoliang Sun
> Assignee: Guoliang Sun
> Priority: Major
> Fix For: 5.0.2
>
> Attachments: image-2025-02-25-16-56-39-917.png,
> image-2025-02-25-16-57-04-319.png
>
>
> For the same model without secondary partitioning, the dimension min/max in
> `dimension_range_info_map` is generated correctly after building.
> !image-2025-02-25-16-56-39-917.png|width=427,height=263!
> For the same model with secondary partitioning, the dimension min/max in
> `dimension_range_info_map` is not generated correctly after building.
> !image-2025-02-25-16-57-04-319.png|width=446,height=190!
> Impact: For query scenarios with only regular dimension filters and no
> partition column filters, the query will scan the entire data after model
> building, and query performance cannot be guaranteed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)