Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229151278
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further 
optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, 
it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all 
[[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to 
`Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, 
Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, 
FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], 
arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), 
Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query 
optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = 
replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = 
Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = 
replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must 
stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or 
`Literal(null, _)`.
    --- End diff --
    
    Can we make it more general? I think the expected expression is:
    1. It's `NullIntolerant`. If any child is null, it will be null.
    2. it has a null child.
    
    so I would write something like
    ```
    case f @ Filter(cond, _) if alwaysNull(cond) => f.copy(condition = false)
    ...
    
    def alwaysNull(e: Expression): Boolean = e match {
      case Literal(null, _) => true
      case n: NullIntolerant => n.children.exists(alwaysNull)
      case _ => false
    }
    
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to