GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/14471
[SPARK-14387][SQL] Exceptions thrown when querying ORC tables ## What changes were proposed in this pull request? This PR improves ORCFileFormat to handle cases when schema stored in the ORC file does not match the schema stored in metastore. ORC Data written by Hive-1.x had virtual column names (HIVE-4243). This is fixed in Hive-2.x, but for data stored using Hive-1.x spark would throw exceptions. To mitigate this, "spark.sql.hve.convertMetastoreOrc" was disabled via SPARK-15705. However, that would incur performance penalties as it would go via HiveTableScan and HadoopRDD. This PR fixes this issue. Related tickets: SPARK-15705 : Change the default value of spark.sql.hive.convertMetastoreOrc to false. SPARK-15705 : Spark won't read ORC schema from metastore for partitioned tables SPARK-16628 : OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files ## How was this patch tested? Manual testing by setting "spark.sql.hve.convertMetastoreOrc=true" and querying data stored via Hive-1.x in ORC format. Also ran unit-tests related to sql. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajeshbalamohan/spark SPARK-14387.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14471.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14471 ---- commit dc943a445047a21a88ab19566eab672e8921dcc1 Author: Rajesh Balamohan <rbalamo...@apache.org> Date: 2016-08-03T02:21:05Z [SPARK-14387][SQL] Exceptions thrown when querying ORC tables ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org