Hi all, I found this jira for an issue I ran into recently: https://issues.apache.org/jira/browse/SPARK-28771
My initial idea for a fix is to change SortMergeJoinExec's (and ShuffledHashJoinExec) requiredChildDistribution. At least if all below conditions are met, we could only require a subset of keys for partitioning: left and right children's output partitionings are hashpartitioning with same numpartitions left and right partition expressions have the same subset (with regards to indices) of their respective join keys If that subset of keys is returned by requiredChildDistribution, then EnsureRequirements.ensureDistributionAndOrdering would not add a shuffle stage, hence reusing the children's partitioning. 1.Thoughts on this approach? 2. Could someone help explain why the different join types have different output partitionings in SortMergeJoinExec.outputPartitioning <https://github.com/apache/spark/blob/cdcd43cbf2479b258f4c5cfa0f6306f475d25cf2/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L85-L96> ? Thanks, Brett