[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

2015-10-01 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940778#comment-14940778
 ] 

Simeon Simeonov commented on SPARK-9761:


[~yhuai] What about this one? The problem survives as restart so it doesn't 
seem to be caused by lack of refreshing.

> Inconsistent metadata handling with ALTER TABLE
> ---
>
> Key: SPARK-9761
> URL: https://issues.apache.org/jira/browse/SPARK-9761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>  Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Duplicate column name: z
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/d9af4b8bb76b9d7befde].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

2016-07-29 Thread David Winters (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399468#comment-15399468
 ] 

David Winters commented on SPARK-9761:
--

Hi [~xwu0226] and [~simeons],

Is there any plan to resolve this issue anytime soon?  I see that this bug 
hasn't been assigned to anyone and there is no fix version set.  I'm seeing 
this same behavior and I also get an exception when attempting to append to the 
altered table.  See below...

{noformat}
java.lang.RuntimeException: Relation[snip... snip...] 
AvroRelation[file:/snip...  snip...]
 requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE 
statement generates the same number of columns as its schema.
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.execution.datasources.PreInsertCastAndRename$$anonfun$apply$1.applyOrElse(rules.scala:44)
at 
org.apache.spark.sql.execution.datasources.PreInsertCastAndRename$$anonfun$apply$1.applyOrElse(rules.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:217)
at 
org.apache.spark.sql.execution.datasources.PreInsertCastAndRename$.apply(rules.scala:34)
at 
org.apache.spark.sql.execution.datasources.PreInsertCastAndRename$.apply(rules.scala:33)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:916)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:916)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914)
at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:918)
at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:917)
at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:921)
at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:921)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:926)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:924)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:930)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:930)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at 
org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:187)
at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:237)
at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:219)
{noformat}


> Inconsistent metadata handling with ALTER TABLE
> ---
>
> Key: SPARK-9761
> URL: https://issues.apache.org/jira/browse/SPARK-9761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>  Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.Hi

[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

2016-08-04 Thread Xin Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408405#comment-15408405
 ] 

Xin Wu commented on SPARK-9761:
---

[~drwinters] Spark 2.0 has support DDL commands, which means it gives the 
opportunity of implementing the ALTER TABLE ADD/CHANG COLUMNS, that is not 
supported yet in current released Spark 2.0.  Spark 2.1 will have some change 
also in the native DDL infrastructure. I think once this is settled, it will be 
easier to support this. I am looking into this also. 

> Inconsistent metadata handling with ALTER TABLE
> ---
>
> Key: SPARK-9761
> URL: https://issues.apache.org/jira/browse/SPARK-9761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>  Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Duplicate column name: z
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/d9af4b8bb76b9d7befde].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

2016-08-04 Thread David Winters (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408828#comment-15408828
 ] 

David Winters commented on SPARK-9761:
--

[~xwu0226] - Thanks for the follow-up.  BTW, I was able to workaround the issue 
by using your suggestion of explicitly creating table.  Thanks!

> Inconsistent metadata handling with ALTER TABLE
> ---
>
> Key: SPARK-9761
> URL: https://issues.apache.org/jira/browse/SPARK-9761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>  Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Duplicate column name: z
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/d9af4b8bb76b9d7befde].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

2015-11-17 Thread Xin Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009947#comment-15009947
 ] 

Xin Wu commented on SPARK-9761:
---

I can recreate this with the following code 
{code}
val conf = new SparkConf().setAppName("TPCDS").setMaster("local")
val sc = new SparkContext(conf)
val hiveContext  = new HiveContext(sc)
hiveContext.sql("drop table Orders")
val df = hiveContext.read.json("/home/xwu0226/spark-tables/Orders.json")
df.show()
df.write.saveAsTable("Orders")
hiveContext.sql("ALTER TABLE Orders add columns (z string)")
hiveContext.refreshTable("Orders")
hiveContext.sql("describe extended Orders").show
{code}
Output:
{code}
+--+---+
|CustomerID|OrderID|
+--+---+
|   452|  1|
+--+---+

16:46:28.483 WARN org.apache.hadoop.hive.metastore.HiveMetaStore: Location: 
file:/user/hive/warehouse/orders specified for non-external table:orders
+--+-+---+
|  col_name|data_type|comment|
+--+-+---+
|CustomerID|   bigint|   |
|   OrderID|   bigint|   |
+--+-+---+
{code}
I am taking a look. 

> Inconsistent metadata handling with ALTER TABLE
> ---
>
> Key: SPARK-9761
> URL: https://issues.apache.org/jira/browse/SPARK-9761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>  Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Duplicate column name: z
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/d9af4b8bb76b9d7befde].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

2015-11-18 Thread Xin Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012655#comment-15012655
 ] 

Xin Wu commented on SPARK-9761:
---

One thing I notice is that if I create the table explicitly before letting the 
dataframe to write into the table,  describe table will show the alter added 
column. Even though I created the table stored as parquet and I verified that 
the saved data file is parquet format.
{code}
hiveContext.sql("drop table Orders")
val df = hiveContext.read.json("/home/xwu0226/spark-tables/Orders.json")
df.show()
hiveContext.sql("create table orders(customerID int, orderID int) stored as 
parquet")
df.write.mode(SaveMode.Append).saveAsTable("Orders")
hiveContext.sql("ALTER TABLE Orders add columns (z string)")
hiveContext.sql("describe extended Orders").show
{code}

output:
{code}
+--+-+---+
|  col_name|data_type|comment|
+--+-+---+
|customerid|  int|   |
|   orderid|  int|   |
| z|   string|   |
+--+-+---+
{code}

So with the explicit creation of the table, the describe seems to use the 
schema merging, while the other case does not merge schema.. 

"spark.sql.sources.provider" property is defined for explicitly created table, 
such that the logic of lookupRelation in HiveMetastoreCatalog.scala goes to 
look up from the cachedDataSrouceTables, where the relation is not found then, 
get reloaded from parquet file, resulting in column schemas created according 
to parquet content.. It would be nice the schema is merged when constructing 
this new relation before giving it back to caller.  Looking deeper into this.. 




> Inconsistent metadata handling with ALTER TABLE
> ---
>
> Key: SPARK-9761
> URL: https://issues.apache.org/jira/browse/SPARK-9761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Ubuntu on AWS
>Reporter: Simeon Simeonov
>  Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Duplicate column name: z
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/d9af4b8bb76b9d7befde].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org