[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

viirya Sat, 22 Oct 2016 19:40:49 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/15575
  
    In practice, setting the `outputPartitioning` of a physical plan like 
`ExpandExec` to `child.outputPartitioning` doesn't cause any real problem, even 
this physical plan doesn't keep the same row distribution of its child. That is 
because if the physical plan changes output, it will have different output 
attributes, e.g., `col` to `col'` as @tejasapatil pointed out.
    
    If its parent plan requires a distribution, says `HashPartition`, this 
distribution will bound to the physical plan's output `col'`, instead of its 
child plan's `col`. So even the physical plan uses `child.outputPartitioning`, 
`EnsureRequirements ` will step in and inject an extra shuffle exchange of 
`HashPartition(col')` to satisfy the requirement.
    
    It works like that as per my understanding. However it doesn't mean the 
physical plan's output partitioning is exactly as same as its child's, i.e., 
`HashPartition(col)`, because it doesn't have the output `col`. This part might 
be confusing to some people, so I think it might be better to explain it more. 
That is what I understood about this, if I am wrong please kindly point out.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

Reply via email to