Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18692
> The point of this issue is not performance improvement, but that some (in
our case automatically generated) queries do not work at all with SPARK,
whereas there is no problem with these queries
Github user Aklakan commented on the issue:
https://github.com/apache/spark/pull/18692
Hi, its unfortunate to see this PR having gotten reverted
@gatorsmile
> After rethinking about it, we might need to revert this PR. Although it
converts a CROSS Join to an Inner join,
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
Yeah. That is a wrong case. Let us revisit it if we can find any useful
case here. Thank you!
---
-
To unsubscribe, e-mail:
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18692
you are right, then I don't know if there is any valid use case for
inferring join condition from literals...
---
-
To
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
I am not sure we can infer ``a == b`` if ``a in (0, 2, 3, 4)`` and ``b in
(0, 2, 3, 4)``.
table 'a'
```
a1 a2
1 2
3 3
4 5
```
table 'b'
```
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
@aokolnychyi Could you rethink about it by using some cases like `a in (0,
2, 3, 4)` and `b in (0, 2, 3, 4)`? and then refer to `a = b`?
---
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18692
Hi, All.
Since the commit is reverted from the master branch, can we update the
status of JIRA issue?
- https://issues.apache.org/jira/browse/SPARK-21417
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
Done. Reverted.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
Will do it. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
Yeah, correct. So, we should revert then.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
Even if we use `BroadcastHashJoin` or `ShuffledHashJoin`, it does not help
because the identical values on keys just cause the unnecessary work in both,
right?
---
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
I took a look at ``JoinSelection``. It seems we will not get
``BroadcastHashJoin`` or ``ShuffledHashJoin`` if we revert this rule.
---
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
Sure, if you guys think it does not give any performance benefits, then
let's revert it.
I also had similar concerns but my understanding was that having an inner
join with some
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18692
Yea I have the same feeling. If the left side has a `a = 1` constraint, and
the right side has a `b = 1` constraint, adding a `a = b` join condition does
not help as it always evaluate to true.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
@aokolnychyi After rethinking about it, we might need to revert this PR.
Although it converts a CROSS Join to an Inner join, it does not improve the
performance. What do you think?
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
LGTM
Thanks for your patience! It looks much good now. Really appreciate for
your contributions! Welcome to make more contributions!
Thanks! Merged to master.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84351/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84351 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84351 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)**
for PR 18692 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84349 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84349/testReport)**
for PR 18692 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84349/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84349 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84349/testReport)**
for PR 18692 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
Also add a test case for non-deterministic cases. For example, given the
left child has `a = rand()` and the right child has `b = rand()`, we should not
get `a = b`
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84177/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84177 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84177/testReport)**
for PR 18692 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84176/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84176 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84176/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84177 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84177/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #84176 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84176/testReport)**
for PR 18692 at commit
Github user Aklakan commented on the issue:
https://github.com/apache/spark/pull/18692
Hi @aokolnychyi, a on note on @SimonBin 's comment (I am his colleague):
> The initial solution handled your case but then there was a decision to
restrict the proposed rule to cross joins
Github user SimonBin commented on the issue:
https://github.com/apache/spark/pull/18692
@aokolnychyi thank you for the clarification, I see now
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@SimonBin The initial solution handled your case but then there was a
decision to restrict the proposed rule to cross joins only. You can find the
reason in this
Github user SimonBin commented on the issue:
https://github.com/apache/spark/pull/18692
Hi, we are very interested in this patch. I wonder if it could detect this
code automatically, without needing to write the explicit join:
```scala
package
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82892/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #82892 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82892/testReport)**
for PR 18692 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82891/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #82891 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82891/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #82892 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82892/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #82891 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82891/testReport)**
for PR 18692 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
cc @gengliangwang Review this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/18692
@cloud-fan : In event when the (set of join keys) is a superset of (child
node's partitioning keys), its possible to avoid shuffle :
https://github.com/apache/spark/pull/19054 ... this can help
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18692
> After adding the inferred join conditions, it might lead to the child
node's partitioning NOT satisfying the JOIN node's requirements which otherwise
could have.
Isn't it an existing
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81390/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #81390 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #81390 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)**
for PR 18692 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
In this PR, we should limit it to `cartesian product` now. In the future,
we need to perform smarter when extracting equi-join keys.
---
If your project is set up for it, you can reply to this
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/18692
Can we restrict this to cartesian product ONLY ? One clear downside of
doing this for other joins is that it will potentially add shuffle in case of
(bucketing queries) and (subqueries in
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
Sorry for the delay. @jiangxb1987 will submit a simple fix for the issue
you mentioned. That will not be a perfect fix but it partially resolve the
issue. In the future, we need to move the
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@gatorsmile what is our decision here? Shall we wait until SPARK-21652 is
resolved? In the meantime, I can add some tests and see how the proposed rule
works together with all others.
---
If
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18692
@aokolnychyi Thanks for finding the non-convergent case! Let me see how to
fix it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@gatorsmile I updated the rule to cover cross join cases. Regarding the
case with the redundant condition mentioned by you, I opened
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80362/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18692
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #80362 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80362/testReport)**
for PR 18692 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18692
**[Test build #80362 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80362/testReport)**
for PR 18692 at commit
62 matches
Mail list logo