Github user viirya commented on the issue: https://github.com/apache/spark/pull/15575 In practice, setting the `outputPartitioning` of a physical plan like `ExpandExec` to `child.outputPartitioning` doesn't cause any real problem, even this physical plan doesn't keep the same row distribution of its child. That is because if the physical plan changes output, it will have different output attributes, e.g., `col` to `col'` as @tejasapatil pointed out. If its parent plan requires a distribution, says `HashPartition`, this distribution will bound to the physical plan's output `col'`, instead of its child plan's `col`. So even the physical plan uses `child.outputPartitioning`, `EnsureRequirements ` will step in and inject an extra shuffle exchange of `HashPartition(col')` to satisfy the requirement. It works like that as per my understanding. However it doesn't mean the physical plan's output partitioning is exactly as same as its child's, i.e., `HashPartition(col)`, because it doesn't have the output `col`. This part might be confusing to some people, so I think it might be better to explain it more. That is what I understood about this, if I am wrong please kindly point out.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org