[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Thank you for reviews so far, @gatorsmile , @hvanhovell , @nsyca , @yhuai . I'm closing this PR. I'm looking forward to see @gatorsmile 's PR and to get better master branch soon. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Never mind. I always appreciate a lot for your review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Sorry, I gave a wrong answer at the beginning. Next time, I will review it more carefully before leaving the comment. Thank you for your work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 :) I think about this issue again. At this stage, could you make a PR for this? I think you're the best person to do that. You made this optimizer and found the correct fix. This was a nice change to investigate this optimizer and nullability propagation for me. @gatorsmile . Thank you for reviewing this. I'll close this PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 One more try: ```Scala val splitConjunctiveConditions: Seq[Expression] = splitConjunctivePredicates(filter.condition) val conditions = splitConjunctiveConditions ++ filter.constraints val leftConditions = conditions.filter(_.references.subsetOf(join.left.outputSet)) val rightConditions = conditions.filter(_.references.subsetOf(join.right.outputSet)) val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) ``` Does this have a hole? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Another version. : ) ```Scala val splitConjunctiveConditions: Seq[Expression] = splitConjunctivePredicates(filter.condition) val conditions = splitConjunctiveConditions ++ filter.constraints.filter(_.isInstanceOf[IsNotNull]) val leftConditions = conditions.filter(_.references.subsetOf(join.left.outputSet)) val rightConditions = conditions.filter(_.references.subsetOf(join.right.outputSet)) val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 How about another version? ``` val leftConditions = (splitConjunctiveConditions ++ filter.constraints.filter(_.isInstanceOf[IsNotNull])) .filter(_.references.subsetOf(join.left.outputSet)) val rightConditions = (splitConjunctiveConditions ++ filter.constraints.filter(_.isInstanceOf[IsNotNull])) .filter(_.references.subsetOf(join.right.outputSet)) val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Oh, that would be perfect fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 How about this fix? ``` val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => expr.references.subsetOf(join.left.outputSet) && canFilterOutNull(expr)) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => expr.references.subsetOf(join.right.outputSet) && canFilterOutNull(expr)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Another better fix is to use `nullable` in `Expression` for `IsNotNull` constraints. `filter.constraints.filter(_.isInstanceOf[IsNotNull])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 `canFilterOutNull ` will cover almost all the cases. Sorry, I did not read the plan until you asked me to write a test case. Then, I realized the implementation of natural/using join is just using `coalesce`. As @hvanhovell and @nsyca said, that is just a syntactic sugar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Please let me think more on this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. I agree. `Expr` could be anything. However, this will reduce the scope of this optimization greatly. Is it okay for you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 If that is not applicable, I agree with @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 That just resolves a specific case. The expressions could be much more complex. `Coalesce` can be used in a very deep layer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 What about this if we could exclude those functions? ```scala val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) -.exists(expr => join.left.outputSet.intersect(expr.references).nonEmpty) +.exists(expr => !expr.isInstanceOf[Coalesce] && + leftOuterAttributeSet.intersect(expr.references).nonEmpty) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) -.exists(expr => join.right.outputSet.intersect(expr.references).nonEmpty) +.exists(expr => !expr.isInstanceOf[Coalesce] && + rightOuterAttributeSet.intersect(expr.references).nonEmpty) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 The right fix is to change the following statements ```Scala val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => join.left.outputSet.intersect(expr.references).nonEmpty) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => join.right.outputSet.intersect(expr.references).nonEmpty) ``` to the following ones: ```Scala val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Sorry, my above description is not clear. `isnotnull(coalesce(b#227, c#238))` does not filter out `NULL` of `b#227` and `c#238`. Only when both are `b#227` and `c#238` are `NULL`, `coalesce(b#227, c#238)` returns `NULL`. Thus, we are unable to use the following two statements to conclude whether left or right has Non-Null predicates. ```Scala filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => join.left.outputSet.intersect(expr.references).nonEmpty) ``` and ```Scala filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => join.right.outputSet.intersect(expr.references).nonEmpty ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14580 Can you explain `isnotnull(coalesce(b#227, c#238)) does not filter out NULL!!!`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 ```Scala val df12 = df1.join(df2, $"df1.a" === $"df2.a", "fullouter") .select(coalesce($"df1.b", $"df2.c").as("a"), $"df1.b", $"df2.c") df12.join(df3, "a").explain(true) ``` This is an example to show that we should not eliminate the outer join, even if `isnotnull(coalesce(b#227, c#238))` contains the attributes that are not in join conditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 None of us is right. : ( ```isnotnull(coalesce(b#227, c#238))``` does not filter out `NULL`!!! Thus, the right fix is to remove the second condition. ```Scala filter.constraints.filter(_.isInstanceOf[IsNotNull]).exists(expr => join.left.outputSet.intersect(expr.references).nonEmpty) ``` and ```Scala filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => join.right.outputSet.intersect(expr.references).nonEmpty ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 I found the root cause. None of us is right. : ( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Thank you, @nsyca! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Hmm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. Here is the output. ```scala scala> val a = Seq((1,2),(2,3)).toDF("a","b").createOrReplaceTempView("A") scala> val b = Seq((2,5),(3,4)).toDF("a","c").createOrReplaceTempView("B") scala> sql("select A.A,B.A,A.B,B.C from A full join B on A.A=B.A").show +++++ | A| A| B| C| +++++ | 1|null| 2|null| |null| 3|null| 4| | 2| 2| 3| 5| +++++ scala> sql("select A.A,B.A from A full join B on A.A=B.A where coalesce(A.B,B.C) is not null").show +---+---+ | A| A| +---+---+ | 2| 2| +---+---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 @dongjoon-hyun, could you please try this on your PR? val a = Seq((1,2),(2,3)).toDF("a","b").createOrReplaceTempView("A") val b = Seq((2,5),(3,4)).toDF("a","c").createOrReplaceTempView("B") sql("select A.A,B.A,A.B,B.C from A full join B on A.A=B.A").show sql("select A.A,B.A from A full join B on A.A=B.A where coalesce(A.B,B.C) is not null").show How many rows do you get from the last and the second last statements? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 @dongjoon-hyun, My apologies on the terse comment I put in previously. There is nothing wrong with the ```full outer join``` with ```using``` What I tried to explain is the ```using``` is a syntactic sugar form of a regular ```full outer join```. Using your example: ``` val a = Seq((1,2),(2,3)).toDF("a","b") val b = Seq((2,5),(3,4)).toDF("a","c") val ab = a.join(b, Seq("a"), "fullouter") scala> ab.explain(true) == Parsed Logical Plan == 'Join UsingJoin(FullOuter,List('a)) :- Project [_1#186 AS a#189, _2#187 AS b#190] : +- LocalRelation [_1#186, _2#187] +- Project [_1#196 AS a#199, _2#197 AS c#200] +- LocalRelation [_1#196, _2#197] == Analyzed Logical Plan == a: int, b: int, c: int Project [coalesce(a#189, a#199) AS a#210, b#190, c#200] +- Join FullOuter, (a#189 = a#199) :- Project [_1#186 AS a#189, _2#187 AS b#190] : +- LocalRelation [_1#186, _2#187] +- Project [_1#196 AS a#199, _2#197 AS c#200] +- LocalRelation [_1#196, _2#197] ... ``` @gatorsmile, you can see here that the interpretation of the ```UsingJoin(...)``` above is a regular ```full outer join``` with the output of the join column in the SELECT clause converted to the expression ```COALESCE(., .)```. The syntax ```UsingJoin``` is gone after the Analysis phase. I found Oracle supports the ```Using``` syntax but it's not clear to me that how Oracle interprets the output column(s) in the USING clause. Here is what I found from [Oracle's website](http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html): > When a USING clause is specified, an asterisk (*) in the select list of the query will be expanded to the following list of columns (in this order): > > All the columns in the USING clause > All the columns of the first (left) table that are not specified in the USING clause > All the columns of the second (right) table that are not specified in the USING clause I am trying to verify whether this PR of @dongjoon-hyun is too restrictive or not. I understand that this PR has fixed the problem reported here but want to make sure it is the right fix. I do agree with @dongjoon-hyun that what he fixed is at the right place. I will post an update on my finding later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Could you give us some examples we miss incorrectly here? We had better discuss on more concrete examples for other audiences. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 @dongjoon-hyun The root cause is not in the optimizer rule `EliminateOuterJoin`. Your fix will reduce the chances we can eliminate the outer-join. This is not a right fix. The existing way we handle `using joins` might also break the other Optimizer rules. @nsyca Thank you for your inputs. DB2 does not have such a join type (using + outer), based on my understanding, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 @hvanhovell The output schema is different. They are not equivalent. Thus, the existing way to handle `using/natural joins` is wrong. We need to introduce new join types. Let me show you the examples. ```Scala Seq((1, "val_1"), (2, "val_2")).toDF("key", "value").createOrReplaceTempView("A") Seq((2, "val_2"), (3, "val_3")).toDF("key", "value").createOrReplaceTempView("B") ``` ```Scala sql("select * from a full join b using(key)").show(true) +---+-+-+ |key|value|value| +---+-+-+ | 3| null|val_3| | 1|val_1| null| | 2|val_2|val_2| +---+-+-+ ``` ```Scala sql("select * from a full join b on a.key = b.key").show(true) ++-++-+ | key|value| key|value| ++-++-+ |null| null| 3|val_3| | 1|val_1|null| null| | 2|val_2| 2|val_2| ++-++-+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Hi, @nsyca . Thank you for comment. But, that's not a correct reason not to use `full outer` in combination with `using`. They can be used together. The root cause of your example problem is the `EliminateOuterJoin` do optimize too much. This PR aims to reduce the scope of this optimizer. After this PR, your example will be handled like the following. (This is the result of this PR.) ``` scala> sql("select coalesce(A.A,B.A), A.A, B.A from A full join B on A.A=B.A where coalesce(A.A,B.A) is not null").show +--+++ |coalesce(A, A)| A| A| +--+++ | 1| 1|null| | 3|null| 3| | 2| 2| 2| +--+++ ``` How do you think about this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63797/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63797/consoleFull)** for PR 14580 at commit [`af189d6`](https://github.com/apache/spark/commit/af189d6f0ee49f99a32c3db570036c96ae73076b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 This problem can be viewed in SQL language like this: ``` val a = Seq((1),(2)).toDF("a").createOrReplaceTempView("A") val b = Seq((2),(3)).toDF("a").createOrReplaceTempView("B") scala> sql("select coalesce(A.A,B.A), A.A, B.A from A full join B on A.A=B.A").show +--+++ |coalesce(A, A)| A| A| +--+++ | 1| 1|null| | 3|null| 3| | 2| 2| 2| +--+++ scala> sql("select coalesce(A.A,B.A), A.A, B.A from A full join B on A.A=B.A where coalesce(A.A,B.A) is not null").show +--+---+---+ |coalesce(A, A)| A| A| +--+---+---+ | 2| 2| 2| +--+---+---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14580 @gatorsmile I am trying to understand you comment. Why shouldn't we use `full outer` in combination with `using`? I am under the impression that using is just a bit of syntactic sugar. For instance `select * from a full join b using(id)` is rewritten into `select * from a full join b on a.id = b.id`, what is the difference between the two? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63797/consoleFull)** for PR 14580 at commit [`af189d6`](https://github.com/apache/spark/commit/af189d6f0ee49f99a32c3db570036c96ae73076b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63785/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63785/consoleFull)** for PR 14580 at commit [`7db83bc`](https://github.com/apache/spark/commit/7db83bc9cb78368e768ef88de4b0733edd6c8d4f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63785/consoleFull)** for PR 14580 at commit [`7db83bc`](https://github.com/apache/spark/commit/7db83bc9cb78368e768ef88de4b0733edd6c8d4f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63733/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63733/consoleFull)** for PR 14580 at commit [`11f2509`](https://github.com/apache/spark/commit/11f250921c6eef0c10915abbf26515f7599abd64). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63733/consoleFull)** for PR 14580 at commit [`11f2509`](https://github.com/apache/spark/commit/11f250921c6eef0c10915abbf26515f7599abd64). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63584/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63584/consoleFull)** for PR 14580 at commit [`ddb4ddd`](https://github.com/apache/spark/commit/ddb4dddb1829098ef012cc63ddf059d663b8454b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Oracle supports it... http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Which NoSQL platforms support `Using Outer Join`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63584/consoleFull)** for PR 14580 at commit [`ddb4ddd`](https://github.com/apache/spark/commit/ddb4dddb1829098ef012cc63ddf059d663b8454b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 I think we should think out of the SQL box. We know that Spark is not a subset of DBMS. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Not sure which RDBMS are supporting `Using Outer Join`. `NULL` generated by outer joins are removed. This sounds a little bit strange. After all, `NULL` also has a meaning. In the plan (by EXPLAIN), it is not easy to know this is a regular outer join or using outer join. That is why I think we should introduce a new join type. At least, users can easily know they are triggering using outer join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Already pinged the previously involved Committers. Let us see what are their feedbacks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. `EliminateOuterJoin` should be updated properly. Any idea? If you have more general idea, you can make a PR to override this. You made this optimizer. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 That is a public API. We are unable to remove it. https://github.com/apache/spark/pull/8600 has a serious bug. It has been fixed in another PR: https://github.com/apache/spark/pull/10353. Now, the issue is how to deal with using/natural outer join. Maybe we can introduce new join types. Or, in the rule, we can find a hacky way to know whether this outer join is nature/using joins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. Exactly. That is what I mean. That is not a regular outer join you considered in this optimizer and now both features are Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 For the regular outer join, the rule works fine. The issue you hit is caused by "using outer join" + "outer join elimination". Thus, your fix does not resolve the root issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 In addition, I'm wondering if you really want to remove that feature which was merged into 1.6 branch on Sep. 21 2015 and already released? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Hi, @gatorsmile . Thank you for review. BTW, could you give me a reason why you think like the following? > The fix does not look right to me. What is the root cause which you think? I think I missed your context. For me, current optimizer work definitely incorrectly (as we see the reported case) and this PR fixes that now. I think this is not a SQL standard issue. If you give some counter examples, I can grasp your concern here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Nature join has the same issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 `USING OUTER JOIN` is not a pure `OUTER JOIN`. This could be affect the other rules. We need to check all the existing rules. This fix does not resolve the root cause. We might need a different join type here. @marmbrus @yhuai @sameeragarwal @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Let me revert the changes by https://github.com/apache/spark/pull/8600 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 The fix does not look right to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63545/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63545/consoleFull)** for PR 14580 at commit [`564b6dc`](https://github.com/apache/spark/commit/564b6dcaa591078dad412ec92c623cc3e46d34e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63545/consoleFull)** for PR 14580 at commit [`564b6dc`](https://github.com/apache/spark/commit/564b6dcaa591078dad412ec92c623cc3e46d34e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63523/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14580 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63523/consoleFull)** for PR 14580 at commit [`695799b`](https://github.com/apache/spark/commit/695799bb6f60398735270f69df632169939b023d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63523/consoleFull)** for PR 14580 at commit [`695799b`](https://github.com/apache/spark/commit/695799bb6f60398735270f69df632169939b023d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org