GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/22990

    [SPARK-25988] [SQL] Keep names unchanged when deduplicating the column 
names in Analyzer

    ## What changes were proposed in this pull request?
    When the queries do not use the column names with the same case, users 
might hit various errors. Below is a typical test failure they can hit.
    ```
    Expected only partition pruning predicates: 
ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15));
    org.apache.spark.sql.AnalysisException: Expected only partition pruning 
predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 
2017-08-15));
        at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146)
        at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560)
        at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
    ```
    
    ## How was this patch tested?
    Added two test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark fix1283

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22990
    
----
commit 5e9f6f345b93d3370906c7b2d73ede15f4089c29
Author: gatorsmile <gatorsmile@...>
Date:   2018-11-09T05:27:37Z

    fix

commit 17b725c79ad602df20c44cacb92e7c6abd84cdda
Author: gatorsmile <gatorsmile@...>
Date:   2018-11-09T05:33:58Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to