[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

aokolnychyi Sat, 27 Oct 2018 04:14:19 -0700

GitHub user aokolnychyi opened a pull request:

    https://github.com/apache/spark/pull/22857


    [SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever 
possible

    ## What changes were proposed in this pull request?
    
    This PR proposes a new optimization rule that replaces `Literal(null, _)` 
with `FalseLiteral` in conditions in `Join` and `Filter`, predicates in `If`, 
conditions in `CaseWhen`.
    
    The idea is that some expressions evaluate to `false` if the underlying 
expression is `null` (as an example see `GeneratePredicate$create` or 
`doGenCode` and `eval` methods in `If` and `CaseWhen`). Therefore, we can 
replace `Literal(null, _)` with `FalseLiteral`, which can lead to more 
optimizations later on.
    
    Letâs consider a few examples.
    
    ```
    val df = spark.range(1, 100).select($"id".as("l"), ($"id" > 50).as("b"))
    df.createOrReplaceTempView("t")
    df.createOrReplaceTempView("p")
    ```
    
    **Case 1**
    ```
    spark.sql("SELECT * FROM t WHERE if(l > 10, false, NULL)").explain(true)
    
    // without the new rule
    â¦
    == Optimized Logical Plan ==
    Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- Filter if ((id#0L > 10)) false else null
       +- Range (1, 100, step=1, splits=Some(12))
    
    == Physical Plan ==
    *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- *(1) Filter if ((id#0L > 10)) false else null
       +- *(1) Range (1, 100, step=1, splits=12)
    
    // with the new rule
    â¦
    == Optimized Logical Plan ==
    LocalRelation <empty>, [l#2L, s#3]
    
    == Physical Plan ==
    LocalTableScan <empty>, [l#2L, s#3]
    ```
    
    **Case 2**
    ```
    spark.sql("SELECT * FROM t WHERE CASE WHEN l < 10 THEN null WHEN l > 40 
THEN false ELSE null ENDâ).explain(true)
    
    // without the new rule
    ...
    == Optimized Logical Plan ==
    Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false 
ELSE null END
       +- Range (1, 100, step=1, splits=Some(12))
    
    == Physical Plan ==
    *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- *(1) Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN 
false ELSE null END
       +- *(1) Range (1, 100, step=1, splits=12)
    
    // with the new rule
    ...
    == Optimized Logical Plan ==
    LocalRelation <empty>, [l#2L, s#3]
    
    == Physical Plan ==
    LocalTableScan <empty>, [l#2L, s#3]
    ```
    
    **Case 3**
    ```
    spark.sql("SELECT * FROM t JOIN p ON IF(t.l > p.l, null, 
false)").explain(true)
    
    // without the new rule
    ...
    == Optimized Logical Plan ==
    Join Inner, if ((l#2L > l#37L)) null else false
    :- Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    :  +- Range (1, 100, step=1, splits=Some(12))
    +- Project [id#0L AS l#37L, cast(id#0L as string) AS s#38]
       +- Range (1, 100, step=1, splits=Some(12))
    
    == Physical Plan ==
    BroadcastNestedLoopJoin BuildRight, Inner, if ((l#2L > l#37L)) null else 
false
    :- *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    :  +- *(1) Range (1, 100, step=1, splits=12)
    +- BroadcastExchange IdentityBroadcastMode
       +- *(2) Project [id#0L AS l#37L, cast(id#0L as string) AS s#38]
          +- *(2) Range (1, 100, step=1, splits=12)
    
    
    // with the new rule
    ...
    == Optimized Logical Plan ==
    LocalRelation <empty>, [l#2L, s#3, l#37L, s#38]
    ```
    
    ## How was this patch tested?
    
    This PR comes with a set of dedicated tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aokolnychyi/spark spark-25860

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22857
    
----
commit 1d8fefd9b227c6aba50b7e012726ec292c75b5a1
Author: Anton Okolnychyi <aokolnychyi@...>
Date:   2018-10-23T09:09:23Z

    [SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever 
possible

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Reply via email to