[ https://issues.apache.org/jira/browse/SPARK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-9278: -------------------------------- Comment: was deleted (was: The result might be definitely different as I ran the codes below with master branch of Spark, local environment without S3, Scala API and Mac OS. Though, I will leave the comment about what I tested in case you might want to test without the environments. Here the codes I ran, {code} // Create data. val alphabets = Seq("a", "e", "i", "o", "u") val partA = (0 to 4).map(i => Seq(alphabets(i % 5), "a", i)) val partB = (5 to 9).map(i => Seq(alphabets(i % 5), "b", i)) val partC = (10 to 14).map(i => Seq(alphabets(i % 5), "c", i)) val data = partA ++ partB ++ partC // Create RDD. val rowsRDD = sc.parallelize(data.map(Row.fromSeq)) // Create Dataframe. val schema = StructType(List( StructField("k", StringType, true), StructField("pk", StringType, true), StructField("v", IntegerType, true)) ) val sdf = sqlContext.createDataFrame(rowsRDD, schema) // Create a empty table. sdf.filter("FALSE") .write .format("parquet") .option("path", "foo") .partitionBy("pk") .saveAsTable("foo") // Save a partitioned table. sdf.filter("pk = 'a'") .write .partitionBy("pk") .insertInto("foo") // Select all. val foo = sqlContext.table("foo") foo.show() {code} And the result was correct as below. {code} +---+---+---+ | k| v| pk| +---+---+---+ | a| 0| a| | e| 1| a| | i| 2| a| | o| 3| a| | u| 4| a| +---+---+---+ {code}) > DataFrameWriter.insertInto inserts incorrect data > ------------------------------------------------- > > Key: SPARK-9278 > URL: https://issues.apache.org/jira/browse/SPARK-9278 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0 > Environment: Linux, S3, Hive Metastore > Reporter: Steve Lindemann > Assignee: Cheng Lian > Priority: Critical > > After creating a partitioned Hive table (stored as Parquet) via the > DataFrameWriter.createTable command, subsequent attempts to insert additional > data into new partitions of this table result in inserting incorrect data > rows. Reordering the columns in the data to be written seems to avoid this > issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org