[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Seems we can't get an agreement on this topic, so I'd close this for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 > The order is different from the original one that is evaluated in the join conditions. I'm not sure what original order you meant. By pulling out to `Project`, they are evaluated by their order in the tables. If you meant the original order is the one after `Sort`, I don't think it is correct. `Sort` is the implementation detail, we should stick with the order of rows in joining tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 The order is different from the original one that is evaluated in the join conditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81054/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #81054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81054/testReport)** for PR 18652 at commit [`793dac4`](https://github.com/apache/spark/commit/793dac4403926fb9f1421f4bbee59a8e9b82d7e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Join [t1.a = rand(t2.b), t1.c = rand(t2.d)] Sort Project [t1.a, t1.c] TableScan t1 Sort Project [rand(t2.b) as rand(t2.b), rand(t2.d) as rand(t2.d)] TableScan t2 Aren't `rand(t2.b)` and `rand(t2.d)` already evaluated in `Project`? Why `Sort` will change the evaluation order? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #81054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81054/testReport)** for PR 18652 at commit [`793dac4`](https://github.com/apache/spark/commit/793dac4403926fb9f1421f4bbee59a8e9b82d7e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 We could add a `Sort` above the `Project` and the orders become different, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @cloud-fan @gatorsmile More thoughts or comments for this change? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 When we join two tables, given there are equi-join keys, and they are non-deterministic, for example `t1.a = rand(t2.b)` and `t1.c = rand(t2.d)`. We pull out them to downstream project: Join [t1.a = rand(t2.b), t1.c = rand(t2.d)] Project [t1.a, t1.c] TableScan t1 Project [rand(t2.b) as rand(t2.b), rand(t2.d) as rand(t2.d)] TableScan t2 `rand(t2.b)` and `rand(t2.d)` are evaluated in projection. Why Join will change its order? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 Did not get your point. Could you just give an example why the non-deterministic expressions are always evaluated in the same order no matter which join types are chosen during the physical planning? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Once we pull out them into downstream project, should we still worry about call orders? They are evaluated before sort or shuffle added later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 You are talking about the number of calls. I am worrying about the call orders. We could add a `SORT`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 > Why equi-join is free from the issues? Assume the equi-join predicates are in the form like `t1.a = rand(t2.b) && t1.c = rand(t2.d)`. When we compare the equi-join keys `(t1.a, t1.c)` and `(rand(t2.b), rand(t2.d))`, we compare them all and won't skip `t1.c = rand(t2.d)` if `t1.a = rand(t2.b)` is false. That says we can pull out it to downstream project and don't need to worry changing the number of calls. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80376/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #80376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80376/testReport)** for PR 18652 at commit [`abf51f7`](https://github.com/apache/spark/commit/abf51f7c76016737d494ac23d3071b2301f96445). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 > As said in previous discussion, we can't avoid few issues regarding non-deterministic non equi join condition. We can simply allow it, but it faces inconsistency due to different join implementations. We can pull out it to downstream project, but it possibly changes the number of calls. EnsureRequirements can change the call order. > Notice that those issues are for non equi join condition, equi join condition is free from the issues. Why equi-join is free from the issues? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #80376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80376/testReport)** for PR 18652 at commit [`abf51f7`](https://github.com/apache/spark/commit/abf51f7c76016737d494ac23d3071b2301f96445). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile @cloud-fan Do you have more comments or thoughts on this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @baibaichen when we do so, I think the result is not as same as Hive's join result. Is it still useful? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user baibaichen commented on the issue: https://github.com/apache/spark/pull/18652 can we add a flag i.e. ignore-non-deterministic , so that we can treat non-deterministic as deterministic, I believe this is what hive does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile Ok. No problem. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 Let me talk with more people to get the feedbacks. Will respond you later. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile Actually it is not rare we add a feature step by step in SparkSQL. This is not a reason preventing us from adding this support. I think this change already help much this kind of workload. As said in previous discussion, we can't avoid few issues regarding the non-deterministic non equi join condition. We can simply allow it, but it faces inconsistency due to different join implementations. We can pull out it to downstream project, but it possibly changes the number of calls. `EnsureRequirements` can change the call order. Notice that those issues are for non equi join condition, equi join condition is free from the issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 I think the goal is just to resolve the migration issues for Hive users. If we just provide a very limited support, I do not think it can help the workload migration. If we really want to resolve the correctness, we need to resolve many issues (e.g., `EnsureRequirements` could also change the call orders of non-deterministic). So many efforts need to be made. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Yea, for the case with non-deterministic non equi join conditions, you'd face the issue of changing the number of calls. So I currently plan not to support it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 yea I know that, I'm thinking about if we need to change it by considering the position. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 No, I don't think it's true. I think we don't consider the position of equi join condition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 I mean, `t1.a = t2.b` before non-determinictic condition is an equi join condition, but `t1.a = t2.b` after non-determinictic condition is not. Is this true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 `t1.a = t2.b` is an equi join condition. `t1.c > rand()` is not. They will be split and considered individually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 Can we say that, `t1.a = t2.b && t1.c > rand()` is a equal-join condition, but `t1.c > rand() && t1.a = t2.b` is not? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Btw, I guess that is why we also pull non-deterministic grouping expressions for Aggregate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 If we simply allow it, the evaluation order of non-deterministic join conditions will be different on different join implementation, e.g. Sort-based and Hash-based. Then we will get inconsistent join results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 What if we simply allow non-deterministic join condition? Since we allow non-deterministic filter condition, we should do this for join condition too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 ping @cloud-fan Can you have time to review this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org