[GitHub] spark pull request #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql...

dongjoon-hyun Sun, 22 Oct 2017 11:07:25 -0700

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19552


    [SPARK-22329][SQL] Use NEVER_INFER for 
`spark.sql.hive.caseSensitiveInferenceMode` by default

    ## What changes were proposed in this pull request?
    
    In Spark 2.2.0, `spark.sql.hive.caseSensitiveInferenceMode` has a critical 
issue by default. 
    
    - [SPARK-19611](https://issues.apache.org/jira/browse/SPARK-19611) uses 
`INFER_AND_SAVE` at 2.2.0 since Spark 2.1.0 breaks some Hive tables backed by 
case-sensitive data files.
    
      > This situation will occur for any Hive table that wasn't created by 
Spark or that was created prior to Spark 2.1.0. If a user attempts to run a 
query over such a table containing a case-sensitive field name in the query 
projection or in the query filter, the query will return 0 results in every 
case.
    
    - However, [SPARK-22306](https://issues.apache.org/jira/browse/SPARK-22306) 
reports this also corrupts Hive Metastore schema by removing bucketing 
information (BUCKETING_COLS, SORT_COLS) and changing owner. This is undesirable 
side-effects. Hive Metastore is a shared resource and Spark should not corrupt 
it by default. 
    
    - Since Spark 2.3.0 supports Bucketing, BUCKETING_COLS and SORT_COLS look 
okay at least. However, we need to figure out the issue of changing owners. 
Also, we cannot backport bucketing patch into `branch-2.2`. We need to verify 
this option with more tests before releasing 2.3.0.
    
    This PR proposes to recover that option back to `NEVER_INFO` like Spark 
2.2.0 by default. Users can take a risk by enabling `INFER_AND_SAVE` by 
themselves.
    
    ## How was this patch tested?
    
    Pass the existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-22329

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19552
    
----
commit a256627dbc2772e69cd0f9f2aa43b384165e3657
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2017-10-22T17:59:15Z

    [SPARK-22329][SQL] Use NEVER_INFER for 
`spark.sql.hive.caseSensitiveInferenceMode` by default

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql...

Reply via email to