GitHub user rajeshbalamohan opened a pull request:

    https://github.com/apache/spark/pull/14471

    [SPARK-14387][SQL] Exceptions thrown when querying ORC tables

    ## What changes were proposed in this pull request?
    This PR improves ORCFileFormat to handle cases when schema stored in the 
ORC file does not match the schema stored in metastore. 
    
    ORC Data written by Hive-1.x had virtual column names (HIVE-4243). This is 
fixed in Hive-2.x, but for data stored using Hive-1.x spark would throw 
exceptions. To mitigate this, "spark.sql.hve.convertMetastoreOrc" was disabled 
via SPARK-15705.  However, that would incur
    performance penalties as it would go via HiveTableScan and HadoopRDD.  This 
PR fixes this issue.
    
    Related tickets:
    SPARK-15705 : Change the default value of 
spark.sql.hive.convertMetastoreOrc to false.
    SPARK-15705 : Spark won't read ORC schema from metastore for partitioned 
tables
    SPARK-16628 : OrcConversions should not convert an ORC table represented by 
MetastoreRelation to HadoopFsRelation if metastore schema does not match schema 
stored in ORC files
    
    
    ## How was this patch tested?
    Manual testing by setting "spark.sql.hve.convertMetastoreOrc=true" and 
querying data stored via Hive-1.x in ORC format. Also ran unit-tests related to 
sql.
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rajeshbalamohan/spark SPARK-14387.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14471.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14471
    
----
commit dc943a445047a21a88ab19566eab672e8921dcc1
Author: Rajesh Balamohan <rbalamo...@apache.org>
Date:   2016-08-03T02:21:05Z

    [SPARK-14387][SQL] Exceptions thrown when querying ORC tables

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to