[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2018-01-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 > The point of this issue is not performance improvement, but that some (in our case automatically generated) queries do not work at all with SPARK, whereas there is no problem with these queries

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2018-01-22 Thread Aklakan
Github user Aklakan commented on the issue: https://github.com/apache/spark/pull/18692 Hi, its unfortunate to see this PR having gotten reverted @gatorsmile > After rethinking about it, we might need to revert this PR. Although it converts a CROSS Join to an Inner join,

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Yeah. That is a wrong case. Let us revisit it if we can find any useful case here. Thank you! --- - To unsubscribe, e-mail:

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 you are right, then I don't know if there is any valid use case for inferring join condition from literals... --- - To

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 I am not sure we can infer ``a == b`` if ``a in (0, 2, 3, 4)`` and ``b in (0, 2, 3, 4)``. table 'a' ``` a1 a2 1 2 3 3 4 5 ``` table 'b' ```

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 @aokolnychyi Could you rethink about it by using some cases like `a in (0, 2, 3, 4)` and `b in (0, 2, 3, 4)`? and then refer to `a = b`? ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18692 Hi, All. Since the commit is reverted from the master branch, can we update the status of JIRA issue? - https://issues.apache.org/jira/browse/SPARK-21417 ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Done. Reverted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Will do it. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 Yeah, correct. So, we should revert then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Even if we use `BroadcastHashJoin` or `ShuffledHashJoin`, it does not help because the identical values on keys just cause the unnecessary work in both, right? ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 I took a look at ``JoinSelection``. It seems we will not get ``BroadcastHashJoin`` or ``ShuffledHashJoin`` if we revert this rule. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 Sure, if you guys think it does not give any performance benefits, then let's revert it. I also had similar concerns but my understanding was that having an inner join with some

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 Yea I have the same feeling. If the left side has a `a = 1` constraint, and the right side has a `b = 1` constraint, adding a `a = b` join condition does not help as it always evaluate to true.

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 @aokolnychyi After rethinking about it, we might need to revert this PR. Although it converts a CROSS Join to an Inner join, it does not improve the performance. What do you think? ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 LGTM Thanks for your patience! It looks much good now. Really appreciate for your contributions! Welcome to make more contributions! Thanks! Merged to master. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84351/ Test PASSed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84349/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84349/ Test FAILed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84349/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Also add a test case for non-deterministic cases. For example, given the left child has `a = rand()` and the right child has `b = rand()`, we should not get `a = b` ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84177/ Test PASSed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84177/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84176/ Test FAILed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84176/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84177/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #84176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84176/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-23 Thread Aklakan
Github user Aklakan commented on the issue: https://github.com/apache/spark/pull/18692 Hi @aokolnychyi, a on note on @SimonBin 's comment (I am his colleague): > The initial solution handled your case but then there was a decision to restrict the proposed rule to cross joins

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-23 Thread SimonBin
Github user SimonBin commented on the issue: https://github.com/apache/spark/pull/18692 @aokolnychyi thank you for the clarification, I see now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-21 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @SimonBin The initial solution handled your case but then there was a decision to restrict the proposed rule to cross joins only. You can find the reason in this

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-20 Thread SimonBin
Github user SimonBin commented on the issue: https://github.com/apache/spark/pull/18692 Hi, we are very interested in this patch. I wonder if it could detect this code automatically, without needing to write the explicit join: ```scala package

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82892/ Test PASSed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #82892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82892/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82891/ Test PASSed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #82891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82891/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #82892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82892/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #82891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82891/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 cc @gengliangwang Review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18692 @cloud-fan : In event when the (set of join keys) is a superset of (child node's partitioning keys), its possible to avoid shuffle : https://github.com/apache/spark/pull/19054 ... this can help

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 > After adding the inferred join conditions, it might lead to the child node's partitioning NOT satisfying the JOIN node's requirements which otherwise could have. Isn't it an existing

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81390/ Test PASSed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #81390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #81390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 In this PR, we should limit it to `cartesian product` now. In the future, we need to perform smarter when extracting equi-join keys. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-01 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18692 Can we restrict this to cartesian product ONLY ? One clear downside of doing this for other joins is that it will potentially add shuffle in case of (bucketing queries) and (subqueries in

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-31 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Sorry for the delay. @jiangxb1987 will submit a simple fix for the issue you mentioned. That will not be a perfect fix but it partially resolve the issue. In the future, we need to move the

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-31 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile what is our decision here? Shall we wait until SPARK-21652 is resolved? In the meantime, I can add some tests and see how the proposed rule works together with all others. --- If

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 @aokolnychyi Thanks for finding the non-convergent case! Let me see how to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-08 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile I updated the rule to cover cross join cases. Regarding the case with the redundant condition mentioned by you, I opened

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80362/ Test PASSed. ---

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #80362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80362/testReport)** for PR 18692 at commit

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #80362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80362/testReport)** for PR 18692 at commit