Mark Grover created SPARK-12257:
-----------------------------------

             Summary: Non partitioned insert into a partitioned Hive table 
doesn't fail
                 Key: SPARK-12257
                 URL: https://issues.apache.org/jira/browse/SPARK-12257
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.1
            Reporter: Mark Grover
            Priority: Minor


I am using Spark 1.5.1 but I anticipate this to be a problem with master as 
well (will check later).

I have a dataframe, and a partitioned Hive table that I want to insert the 
contents of the data frame into.

Let's say mytable is a non-partitioned Hive table and mytable_partitioned is a 
partitioned Hive table. In Hive, if you try to insert from the non-partitioned 
mytable table into mytable_partitioned without specifying the partition, the 
query fails, as expected:
{quote}
INSERT INTO mytable_partitioned SELECT * FROM mytable;
{quote}
Error: Error while compiling statement: FAILED: SemanticException 1:12 Need to 
specify partition columns because the destination table is partitioned. Error 
encountered near token 'mytable_partitioned' (state=42000,code=40000)
{quote}

However, if I do the same in Spark SQL:
{code}
val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
sqlContext.sql("INSERT INTO mytable_partitioned SELECT * FROM my_df_temp_table")
{code}
This appears to succeed but does no insertion. This should fail with an error 
stating the data is being inserted into a partitioned table without specifying 
the name of the partition.

Of course, the name of the partition is explicitly specified, both Hive and 
Spark SQL do the right thing and function correctly.
In hive:
{code}
INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM mytable;
{code}
In Spark SQL:
{code}
val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
sqlContext.sql("INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * 
FROM my_df_temp_table")
{code}

And, here are the definitions of my tables, as reference:
{code}
CREATE TABLE mytable(x INT);
CREATE TABLE mytable_partitioned (x INT) PARTITIONED BY (y INT);
{code}

You will also need to insert some dummy data into mytable to ensure that the 
insertion is actually not working:
{code}
#!/bin/bash
rm -rf data.txt;
for i in {0..9}; do
echo $i >> data.txt
done
sudo -u hdfs hadoop fs -put data.txt /user/hive/warehouse/mytable
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to