[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2017-12-06 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281170#comment-16281170
 ] 

Dongjoon Hyun commented on SPARK-15705:
---

Since 2.2.1 is released, I'll update the result from 2.2.1, too.
{code}
scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> spark.version
res2: String = 2.2.1
{code}

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Assignee: Yin Huai
>Priority: Critical
> Fix For: 2.0.0
>
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2017-09-12 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163747#comment-16163747
 ] 

Dongjoon Hyun commented on SPARK-15705:
---

Hi, All.

I'm tracking this bug. This seems to be fixed since 2.1.1.

{code}
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.table("default.test").printSchema
root
 |-- _col0: long (nullable = true)
 |-- _col1: string (nullable = true)
 |-- state: string (nullable = true)

scala> sc.version
res3: String = 2.0.2
{code}

{code}
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sc.version
res3: String = 2.1.1
{code}

{code}
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)


scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sc.version
res3: String = 2.2.0
{code}

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Assignee: Yin Huai
>Priority: Critical
> Fix For: 2.0.0
>
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384648#comment-15384648
 ] 

Yin Huai commented on SPARK-15705:
--

The proper fix will be tracked by 
https://issues.apache.org/jira/browse/SPARK-16628.

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Assignee: Yin Huai
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384604#comment-15384604
 ] 

Apache Spark commented on SPARK-15705:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/14267

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Assignee: Yin Huai
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384593#comment-15384593
 ] 

Yin Huai commented on SPARK-15705:
--

I will send a PR to change the default value of 
{{spark.sql.hive.convertMetastoreOrc}}. Then, I will start to look at the fix. 

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Assignee: Yin Huai
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-19 Thread Michael Allman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384464#comment-15384464
 ] 

Michael Allman commented on SPARK-15705:


Yeah, I'm not too keen on that code path either. Inferring the schema from 
reading every file in a partitioned table is a relatively heavyweight and slow 
operation. We're not leveraging the fact that we have a metastore schema. This 
is a performance/efficiency issue I've been working on in our own codebase, and 
I believe we'll have something to contribute in the near future. However, that 
will definitely not make it into 2.0. As for your problem specifically, I can't 
really suggest a quick fix because I have no experience with the ORC file 
format.

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-13 Thread Nic Eggert (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375543#comment-15375543
 ] 

Nic Eggert commented on SPARK-15705:


I believe the problem is around here: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L301

You can see that there's special handling for merging parquet and metastore 
schema, but for anything else, the schema is just taken from the file. I don't 
really have the expertise to fix this on my own, but I'm happy to help in any 
way I can.

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-12 Thread Nic Eggert (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373786#comment-15373786
 ] 

Nic Eggert commented on SPARK-15705:


Found a work-around: Set spark.sql.hive.convertMetastoreOrc=false.

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-07-11 Thread Nic Eggert (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371513#comment-15371513
 ] 

Nic Eggert commented on SPARK-15705:


Double-checked just for fun. The problem still exists in RC2.

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-06-22 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345533#comment-15345533
 ] 

Jeff Zhang commented on SPARK-15705:


I will take a look at it. 

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-06-22 Thread Nic Eggert (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345036#comment-15345036
 ] 

Nic Eggert commented on SPARK-15705:


Raised priority to critical. We have a lot of partitioned tables stored as ORC, 
so dataframes/datasets in  2.0 are effectively useless to us until this is 
fixed.

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

2016-06-02 Thread Xin Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313257#comment-15313257
 ] 

Xin Wu commented on SPARK-15705:


I can recreate it now. and will look into it. 

> Spark won't read ORC schema from metastore for partitioned tables
> -
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>Reporter: Nic Eggert
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org