dongxu created SPARK-6644: ----------------------------- Summary: [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column is NULL Key: SPARK-6644 URL: https://issues.apache.org/jira/browse/SPARK-6644 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: dongxu
In hive,the schema of partition may be difference from the table schema. For example, we add new column. When we use spark-sql to query the data of partition which schema is difference from the table schema. some problems is solved(https://github.com/apache/spark/pull/4289), but if you add a new column,put new data into the old partition,new column value is NULL [According to the following steps]: case class TestData(key: Int, value: String) val testData = TestHive.sparkContext.parallelize( (1 to 10).map(i => TestData(i, i.toString))).toDF() testData.registerTempTable("testData") sql("DROP TABLE IF EXISTS table_with_partition ") sql(s"CREATE TABLE IF NOT EXISTS table_with_partition(key int,value string) PARTITIONED by (ds string) location '${tmpDir.toURI.toString}' ") sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='1') SELECT key,value FROM testData") // add column to table sql("ALTER TABLE table_with_partition ADD COLUMNS(key1 string)") sql("ALTER TABLE table_with_partition ADD COLUMNS(destlng double)") sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='1') SELECT key,value,'test',1.11 FROM testData") sql("select * from table_with_partition where ds='1' ").collect().foreach(println) result : [1,1,null,null,1] [2,2,null,null,1] result we expect: [1,1,test,1.11,1] [2,2,test,1.11,1] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org