[ https://issues.apache.org/jira/browse/SPARK-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-12257: ------------------------------------ Assignee: Apache Spark > Non partitioned insert into a partitioned Hive table doesn't fail > ----------------------------------------------------------------- > > Key: SPARK-12257 > URL: https://issues.apache.org/jira/browse/SPARK-12257 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.1 > Reporter: Mark Grover > Assignee: Apache Spark > Priority: Minor > > I am using Spark 1.5.1 but I anticipate this to be a problem with master as > well (will check later). > I have a dataframe, and a partitioned Hive table that I want to insert the > contents of the data frame into. > Let's say mytable is a non-partitioned Hive table and mytable_partitioned is > a partitioned Hive table. In Hive, if you try to insert from the > non-partitioned mytable table into mytable_partitioned without specifying the > partition, the query fails, as expected: > {quote} > INSERT INTO mytable_partitioned SELECT * FROM mytable; > {quote} > Error: Error while compiling statement: FAILED: SemanticException 1:12 Need > to specify partition columns because the destination table is partitioned. > Error encountered near token 'mytable_partitioned' (state=42000,code=40000) > {quote} > However, if I do the same in Spark SQL: > {code} > val myDfTempTable = myDf.registerTempTable("my_df_temp_table") > sqlContext.sql("INSERT INTO mytable_partitioned SELECT * FROM > my_df_temp_table") > {code} > This appears to succeed but does no insertion. This should fail with an error > stating the data is being inserted into a partitioned table without > specifying the name of the partition. > Of course, the name of the partition is explicitly specified, both Hive and > Spark SQL do the right thing and function correctly. > In hive: > {code} > INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM mytable; > {code} > In Spark SQL: > {code} > val myDfTempTable = myDf.registerTempTable("my_df_temp_table") > sqlContext.sql("INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * > FROM my_df_temp_table") > {code} > And, here are the definitions of my tables, as reference: > {code} > CREATE TABLE mytable(x INT); > CREATE TABLE mytable_partitioned (x INT) PARTITIONED BY (y INT); > {code} > You will also need to insert some dummy data into mytable to ensure that the > insertion is actually not working: > {code} > #!/bin/bash > rm -rf data.txt; > for i in {0..9}; do > echo $i >> data.txt > done > sudo -u hdfs hadoop fs -put data.txt /user/hive/warehouse/mytable > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org