shauryachats opened a new pull request, #18401:
URL: https://github.com/apache/pinot/pull/18401
## Problem
For multi-stream realtime tables, `getPartitionMetadataFromTableConfig` was
storing `numPartitionGroups` (total across all streams) in
`ColumnPartitionMetadata.numPartitions`.
This is incorrect - the broker's partition pruning compares that value
against the per-stream partition count from the partition function. Using the
total inflated the count by a factor of `numStreams`, causing pruning to
silently skip segments it should have matched.
## Fix
- Compute `perStreamNumPartitions = numPartitionGroups / numStreams` and
use it in `ColumnPartitionMetadata`, consistent with what the broker's
partition function expects.
- Return `null` early (skip persisting partition metadata) when
`numPartitionGroups` is not evenly divisible by `numStreams`, logging a
warning. This avoids storing metadata that would produce incorrect pruning
results (and `null` means segment will always be included).
- Single-stream tables are unaffected (`perStreamNumPartitions =
numPartitionGroups`).
## Tests
Added `testGetPartitionMetadataFromTableConfig` covering:
- No `SegmentPartitionConfig` → `null`
- Single-stream: partition count equals total partition groups (no change)
- Multi-stream, even distribution: partition count equals
`numPartitionGroups / numStreams`
- Multi-stream, uneven distribution: returns `null`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]