[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...

eatoncys Thu, 19 Jul 2018 19:18:26 -0700

GitHub user eatoncys opened a pull request:

    https://github.com/apache/spark/pull/21823


    [SPARK-24870][SQL]Cache can't work normally if there are case letters in SQL

    ## What changes were proposed in this pull request?
    Modified the canonicalized to not case-insensitive.
    Before the PR, cache can't work normally if there are case letters in SQL, 
    for example:
         sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING 
hive")
    
        sql("select key, sum(case when Key > 0 then 1 else 0 end) as 
positiveNum " +
          "from src group by key").cache().createOrReplaceTempView("src_cache")
        sql(
          s"""select a.key
               from
               (select key from src_cache where positiveNum = 1)a
               left join
               (select key from src_cache )b
               on a.key=b.key
            """).explain
    
    The physical plan of the sql is:
    
![image](https://user-images.githubusercontent.com/26834091/42979518-3decf0fa-8c05-11e8-9837-d5e4c334cb1f.png)
    
    The subquery "select key from src_cache where positiveNum = 1" on the left 
of join can use the cache data, but the subquery "select key from src_cache" on 
the right of join cannot use the cache data.
    
    ## How was this patch tested?
    
    new added test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eatoncys/spark canonicalized

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21823
    
----
commit 2b2a5a33ed58ce07fd2515eb01e80acbedeb8b2a
Author: 10129659 <chen.yanshan@...>
Date:   2018-07-20T01:43:53Z

    Cache can't work normally if there are case letters in SQL

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...

Reply via email to