[ https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384464#comment-15384464 ]
Michael Allman commented on SPARK-15705: ---------------------------------------- Yeah, I'm not too keen on that code path either. Inferring the schema from reading every file in a partitioned table is a relatively heavyweight and slow operation. We're not leveraging the fact that we have a metastore schema. This is a performance/efficiency issue I've been working on in our own codebase, and I believe we'll have something to contribute in the near future. However, that will definitely not make it into 2.0. As for your problem specifically, I can't really suggest a quick fix because I have no experience with the ORC file format. > Spark won't read ORC schema from metastore for partitioned tables > ----------------------------------------------------------------- > > Key: SPARK-15705 > URL: https://issues.apache.org/jira/browse/SPARK-15705 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1) > Reporter: Nic Eggert > Priority: Critical > > Spark does not seem to read the schema from the Hive metastore for > partitioned tables stored as ORC files. It appears to read the schema from > the files themselves, which, if they were created with Hive, does not match > the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To > reproduce: > In Hive: > {code} > hive> create table default.test (id BIGINT, name STRING) partitioned by > (state STRING) stored as orc; > hive> insert into table default.test partition (state="CA") values (1, > "mike"), (2, "steve"), (3, "bill"); > {code} > In Spark > {code} > scala> spark.table("default.test").printSchema > {code} > Expected result: Spark should preserve the column names that were defined in > Hive. > Actual Result: > {code} > root > |-- _col0: long (nullable = true) > |-- _col1: string (nullable = true) > |-- state: string (nullable = true) > {code} > Possibly related to SPARK-14959? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org