[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...

gatorsmile Fri, 19 Jan 2018 15:48:07 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20333#discussion_r162759088
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends 
Rule[LogicalPlan] with PredicateHelper {
        */
       def isCartesianProduct(join: Join): Boolean = {
         val conditions = 
join.condition.map(splitConjunctivePredicates).getOrElse(Nil)
    -    !conditions.map(_.references).exists(refs => 
refs.exists(join.left.outputSet.contains)
    -        && refs.exists(join.right.outputSet.contains))
    +
    +    conditions match {
    +      case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => 
false
    +      case _ => !conditions.map(_.references).exists(refs =>
    +        refs.exists(join.left.outputSet.contains) && 
refs.exists(join.right.outputSet.contains))
    +    }
       }
     
       def apply(plan: LogicalPlan): LogicalPlan =
         if (SQLConf.get.crossJoinEnabled) {
           plan
         } else plan transform {
    -      case j @ Join(left, right, Inner | LeftOuter | RightOuter | 
FullOuter, condition)
    +      case j @ Join(left, right, Inner | LeftOuter | RightOuter | 
FullOuter, _)
    --- End diff --
    
    For inner joins, we will not hit this, because it is already optimized to 
an empty relation. For the other outer join types, we face the exactly same 
issue as the condition is true. That is, the size of the join result sets is 
still the same.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...

Reply via email to