[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

Xin Wu (JIRA) Wed, 18 Nov 2015 18:33:02 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012655#comment-15012655
 ]


Xin Wu commented on SPARK-9761:
-------------------------------

One thing I notice is that if I create the table explicitly before letting the 
dataframe to write into the table,  describe table will show the alter added 
column. Even though I created the table stored as parquet and I verified that 
the saved data file is parquet format.
{code}
hiveContext.sql("drop table Orders")
    val df = hiveContext.read.json("/home/xwu0226/spark-tables/Orders.json")
    df.show()
    hiveContext.sql("create table orders(customerID int, orderID int) stored as 
parquet")
    df.write.mode(SaveMode.Append).saveAsTable("Orders")
    hiveContext.sql("ALTER TABLE Orders add columns (z string)")
    hiveContext.sql("describe extended Orders").show
{code}

output:
{code}
+----------+---------+-------+
|  col_name|data_type|comment|
+----------+---------+-------+
|customerid|      int|       |
|   orderid|      int|       |
|         z|   string|       |
+----------+---------+-------+
{code}

So with the explicit creation of the table, the describe seems to use the 
schema merging, while the other case does not merge schema.. 

"spark.sql.sources.provider" property is defined for explicitly created table, 
such that the logic of lookupRelation in HiveMetastoreCatalog.scala goes to 
look up from the cachedDataSrouceTables, where the relation is not found then, 
get reloaded from parquet file, resulting in column schemas created according 
to parquet content.. It would be nice the schema is merged when constructing 
this new relation before giving it back to caller.  Looking deeper into this.. 




> Inconsistent metadata handling with ALTER TABLE
> -----------------------------------------------
>
>                 Key: SPARK-9761
>                 URL: https://issues.apache.org/jira/browse/SPARK-9761
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>         Environment: Ubuntu on AWS
>            Reporter: Simeon Simeonov
>              Labels: hive, sql
>
> Schema changes made with {{ALTER TABLE}} are not shown in {{DESCRIBE TABLE}}. 
> The table in question was created with {{HiveContext.read.json()}}.
> Steps:
> # {{alter table dimension_components add columns (z string);}} succeeds.
> # {{describe dimension_components;}} does not show the new column, even after 
> restarting spark-sql.
> # A second {{alter table dimension_components add columns (z string);}} fails 
> with RROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Duplicate column name: z
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/d9af4b8bb76b9d7befde].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9761) Inconsistent metadata handling with ALTER TABLE

Reply via email to