GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19552
[SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.caseSensitiveInferenceMode` by default ## What changes were proposed in this pull request? In Spark 2.2.0, `spark.sql.hive.caseSensitiveInferenceMode` has a critical issue by default. - [SPARK-19611](https://issues.apache.org/jira/browse/SPARK-19611) uses `INFER_AND_SAVE` at 2.2.0 since Spark 2.1.0 breaks some Hive tables backed by case-sensitive data files. > This situation will occur for any Hive table that wasn't created by Spark or that was created prior to Spark 2.1.0. If a user attempts to run a query over such a table containing a case-sensitive field name in the query projection or in the query filter, the query will return 0 results in every case. - However, [SPARK-22306](https://issues.apache.org/jira/browse/SPARK-22306) reports this also corrupts Hive Metastore schema by removing bucketing information (BUCKETING_COLS, SORT_COLS) and changing owner. This is undesirable side-effects. Hive Metastore is a shared resource and Spark should not corrupt it by default. - Since Spark 2.3.0 supports Bucketing, BUCKETING_COLS and SORT_COLS look okay at least. However, we need to figure out the issue of changing owners. Also, we cannot backport bucketing patch into `branch-2.2`. We need to verify this option with more tests before releasing 2.3.0. This PR proposes to recover that option back to `NEVER_INFO` like Spark 2.2.0 by default. Users can take a risk by enabling `INFER_AND_SAVE` by themselves. ## How was this patch tested? Pass the existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-22329 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19552.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19552 ---- commit a256627dbc2772e69cd0f9f2aa43b384165e3657 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2017-10-22T17:59:15Z [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.caseSensitiveInferenceMode` by default ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org