[jira] [Assigned] (SPARK-12257) Non partitioned insert into a partitioned Hive table doesn't fail

Apache Spark (JIRA) Thu, 10 Dec 2015 02:59:33 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-12257:
------------------------------------

    Assignee: Apache Spark

> Non partitioned insert into a partitioned Hive table doesn't fail
> -----------------------------------------------------------------
>
>                 Key: SPARK-12257
>                 URL: https://issues.apache.org/jira/browse/SPARK-12257
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Mark Grover
>            Assignee: Apache Spark
>            Priority: Minor
>
> I am using Spark 1.5.1 but I anticipate this to be a problem with master as 
> well (will check later).
> I have a dataframe, and a partitioned Hive table that I want to insert the 
> contents of the data frame into.
> Let's say mytable is a non-partitioned Hive table and mytable_partitioned is 
> a partitioned Hive table. In Hive, if you try to insert from the 
> non-partitioned mytable table into mytable_partitioned without specifying the 
> partition, the query fails, as expected:
> {quote}
> INSERT INTO mytable_partitioned SELECT * FROM mytable;
> {quote}
> Error: Error while compiling statement: FAILED: SemanticException 1:12 Need 
> to specify partition columns because the destination table is partitioned. 
> Error encountered near token 'mytable_partitioned' (state=42000,code=40000)
> {quote}
> However, if I do the same in Spark SQL:
> {code}
> val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
> sqlContext.sql("INSERT INTO mytable_partitioned SELECT * FROM 
> my_df_temp_table")
> {code}
> This appears to succeed but does no insertion. This should fail with an error 
> stating the data is being inserted into a partitioned table without 
> specifying the name of the partition.
> Of course, the name of the partition is explicitly specified, both Hive and 
> Spark SQL do the right thing and function correctly.
> In hive:
> {code}
> INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM mytable;
> {code}
> In Spark SQL:
> {code}
> val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
> sqlContext.sql("INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * 
> FROM my_df_temp_table")
> {code}
> And, here are the definitions of my tables, as reference:
> {code}
> CREATE TABLE mytable(x INT);
> CREATE TABLE mytable_partitioned (x INT) PARTITIONED BY (y INT);
> {code}
> You will also need to insert some dummy data into mytable to ensure that the 
> insertion is actually not working:
> {code}
> #!/bin/bash
> rm -rf data.txt;
> for i in {0..9}; do
> echo $i >> data.txt
> done
> sudo -u hdfs hadoop fs -put data.txt /user/hive/warehouse/mytable
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12257) Non partitioned insert into a partitioned Hive table doesn't fail

Reply via email to