[jira] [Comment Edited] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports

Xin Wu (JIRA) Mon, 18 Jul 2016 14:18:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383082#comment-15383082
 ]


Xin Wu edited comment on SPARK-16605 at 7/18/16 9:17 PM:
---------------------------------------------------------

The current issue for dealing with ORC data inserted by Hive is that the schema 
stored in orc file inserted by hive is using dummy column name such as "_col1, 
_col2, ...". Hive knows how to read the data. However, in Spark SQL, for 
performance gain, it tries to convert ORC table to its native ORC relation for 
scanning, in that it infers schema from orc file directly but getting the table 
schema from hive metastore. There are then mismatch here. 

Try the workaround that turns off this conversion for performance: 
{code}set spark.sql.hive.convertMetastoreOrc=false{code}

Then, see if it works. 


was (Author: xwu0226):
The current issue for dealing with ORC data inserted by Hive is that the schema 
stored in orc file inserted by hive is using dummy column name such as "_col1, 
_col2, ...". Hive knows how to read the data. However, in Spark SQL, for 
performance gain, it tries to convert ORC table to its native ORC relation for 
scanning, in that it infers schema from orc file directly but getting the table 
schema from hive megastore. There are then mismatch here. 

Try the workaround that turns off this conversion for performance: 
{code}set spark.sql.hive.convertMetastoreOrc=false{code}

Then, see if it works. 

> Spark2.0 cannot "select" data from a table stored as an orc file which has 
> been created by hive while hive or spark1.6 supports
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16605
>                 URL: https://issues.apache.org/jira/browse/SPARK-16605
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: marymwu
>         Attachments: screenshot-1.png
>
>
> Spark2.0 cannot "select" data from a table stored as an orc file which has 
> been created by hive while hive or spark1.6 supports
> Steps:
> 1. Use hive to create a table "tbtxt" stored as txt and load data into it.
> 2. Use hive to create a table "tborc" stored as orc and insert the data from 
> table "tbtxt" . Example, "create table tborc stored as orc as select * from 
> tbtxt"
> 3. Use spark2.0 to "select * from tborc;".-->error 
> occurs,java.lang.IllegalArgumentException: Field "nid" does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports

Reply via email to