peter-toth opened a new pull request, #55519: URL: https://github.com/apache/spark/pull/55519
### What changes were proposed in this pull request? When a `KeyedPartitioning` passes through a `PartitioningPreservingUnaryExecNode` (e.g. `ProjectExec`), the previous implementation projected the partitioning as a whole expression via `multiTransformDown`. If any expression position could not be mapped to an output attribute, the entire `KeyedPartitioning` was silently dropped, resulting in `UnknownPartitioning`. This PR replaces that approach with a per-position projection algorithm implemented in two new private helpers (`projectKeyedPartitionings` and `projectOtherPartitionings`), with the main `outputPartitioning` reduced to a simple split, project, and combine: 1. For each expression position (0..N-1), collect the unique expressions at that position across all input `KeyedPartitioning`s (using `ExpressionSet` to deduplicate semantically equal expressions), then project each through the output aliases via `projectExpression`. 2. Positions with at least one projected alternative are *projectable*; they define the maximum achievable granularity. Positions that cannot be expressed in the output are dropped (narrowing). 3. The shared `partitionKeys` are projected to the subset of projectable positions via `KeyedPartitioning.projectKeys`. 4. The final `KeyedPartitioning`s are the cross-product of per-position alternatives, computed lazily via `MultiTransform.generateCartesianProduct`, deduplicated, and bounded by a single outer `take(aliasCandidateLimit)`. All resulting `KeyedPartitioning`s at the same granularity share the same `partitionKeys` object, preserving the invariant required by `GroupPartitionsExec`. ### Why are the changes needed? Without narrowing, a `ProjectExec` that drops any one of a multi-column partition key causes the entire `KeyedPartitioning` to be lost. This breaks storage-partitioned join optimisations that rely on the partitioning surviving projection. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests in `ProjectedOrderingAndPartitioningSuite` covering: - Full-granularity alias substitution (existing behaviour, unchanged) - 2->1 narrowing without aliases - 2->1 narrowing with alias, verifying shared `partitionKeys` object identity - 3->2 narrowing with alias - `PartitioningCollection` where one KP can be fully projected and another cannot ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
