[ https://issues.apache.org/jira/browse/SPARK-16628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-16628: ------------------------------------ Assignee: Apache Spark > OrcConversions should not convert an ORC table represented by > MetastoreRelation to HadoopFsRelation if metastore schema does not match > schema stored in ORC files > ----------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-16628 > URL: https://issues.apache.org/jira/browse/SPARK-16628 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Yin Huai > Assignee: Apache Spark > > When {{spark.sql.hive.convertMetastoreOrc}} is enabled, we will convert a ORC > table represented by a MetastoreRelation to HadoopFsRelation that uses > Spark's OrcFileFormat internally. This conversion aims to make table scanning > have a better performance since at runtime, the code path to scan > HadoopFsRelation's performance is better. However, OrcFileFormat's > implementation is based on the assumption that ORC files store their schema > with correct column names. However, before Hive 2.0, an ORC table created by > Hive does not store column name correctly in the ORC files (HIVE-4243). So, > for this kind of ORC datasets, we cannot really convert the code path. > Right now, if ORC tables are created by Hive 1.x or 0.x, enabling > {{spark.sql.hive.convertMetastoreOrc}} will introduce a runtime exception for > non-partitioned ORC tables and drop the metastore schema for partitioned ORC > tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org