[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2016-10-09 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561337#comment-15561337
 ] 

Xiao Li commented on SPARK-9762:


Now, we do not support ALTER TABLE CHANGE COLUMN since 2.0, no matter whether 
the table is Hive-serde tables or data source tables. Thanks! Please submit a 
new PR for requesting it as a new feature, if needed. Thanks!

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2015-11-13 Thread Shaun A Elliott (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004359#comment-15004359
 ] 

Shaun A Elliott commented on SPARK-9762:


Is there a workaround for this at all?

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2015-10-01 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940776#comment-14940776
 ] 

Simeon Simeonov commented on SPARK-9762:


[~yhuai] the Hive compatibility section of the documentation should be updated 
to identify these cases. It is unfortunate to trust the docs only to discover a 
known lack of compatibility that was not documented.

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2015-10-01 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940763#comment-14940763
 ] 

Yin Huai commented on SPARK-9762:
-

[~simeons] Different versions of Hive have different kinds internal 
restrictions. It is not always possible to store the metadata in a hive 
compatible way. For example, if the metastore uses Hive 0.13, Hive will rejects 
the create table call of a parquet table if columns have a binary or a decimal 
one. So, to save the table's metadata, we have to workaround it and save it in 
a way that is not compatible with hive. 

The reason that you see two different output for describe table and show 
columns is that spark sql has implemented describe table but we still delegate 
show columns command to Hive. Because the metadata is not hive compatible, show 
columns command gives you a different output.

We have been gradually adding more coverage on native support of different 
kinds of commands. If there is any specific commands that are important to your 
use cases, please feel free to create jiras. 

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2015-10-01 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940757#comment-14940757
 ] 

Simeon Simeonov commented on SPARK-9762:


[~yhuai] Refreshing is not the issue here. The issue is that {{DESCRIBE tbl}} 
and {{SHOW COLUMNS tbl}} show different columns for a table even without 
altering it which suggests that Spark SQL is not managing table metadata 
correctly.

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2015-10-01 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940082#comment-14940082
 ] 

Yin Huai commented on SPARK-9762:
-

[~simeons] For spark sql's data source tables, if you are adding new columns, I 
think {{hiveContext.refreshTable}} or {{sql("REFRESH TABLE ...")}} is the way 
to let it pick up new columns. Can you try it?

Basically, for self-describing data sources like json, parquet, and orc, we are 
trying to give you the latest schema automatically without requiring users to 
explicitly set the schema (sometimes you need to use refresh table method 
because we are caching metadata at driver side to reduce query planning time).

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9762) ALTER TABLE cannot find column

2015-10-01 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939883#comment-14939883
 ] 

Simeon Simeonov commented on SPARK-9762:


[~yhuai] any thoughts on this?

> ALTER TABLE cannot find column
> --
>
> Key: SPARK-9762
> URL: https://issues.apache.org/jira/browse/SPARK-9762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct
> comp_criteria string
> comp_data_model   string
> comp_dimensions   
> struct,template:string,variation:bigint>
> comp_disabled boolean
> comp_id   bigint
> comp_path string
> comp_placementDatastruct
> comp_slot_types   array
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>   at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org