[GitHub] spark issue #19080: [SPARK-21865][SQL] simplify the distribution semantic of...

yhuai Wed, 30 Aug 2017 16:18:58 -0700

Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/19080
  
    Have a question after reading the new approach. Let's say that we have a 
join like `T1 JOIN T2 on T1.a = T2.a`. Also `T1` is hash partitioned by the 
value of `T1.a` and it has 10 partitions, and `T2` is range partitioned by the 
value of `T2.a` and it has 10 partitions. Both sides will satisfy the required 
distribution of the join. However, we need to add an exchange at either side in 
order to produce the correct result. How will we handle this case with this 
change?
    
    Also, regarding
    > For multiple children, Spark only guarantees they have the same number of 
partitions, and it's the operator's responsibility to leverage this guarantee 
to achieve more complicated requirements. 
    
    Can you give a concrete example?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19080: [SPARK-21865][SQL] simplify the distribution semantic of...

Reply via email to