[ https://issues.apache.org/jira/browse/SPARK-19153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823103#comment-15823103 ]
Shuai Lin commented on SPARK-19153: ----------------------------------- I find it's quite straight forward to remove the restriction of partitioned-by for the {{create table t1 using hive partitioned by (c1,c2) as select ..."}} CTAS statement. But another problem comes up: the partition columns must be on the right most of the schema, otherwise the schema we stored in the table property of metastore (with the property key "spark.sql.sources.schema") would be inconsistent with the schema we read back from hive client api. The reason is, when creating a hive table in the metastore, the schema and partition columns are disjoint sets (as required by hive client api). And when we reading it back, we append the partition columns to the end of the schema to get the catalyst schema, i.e.: {code} // HiveClientImpl.scala val partCols = h.getPartCols.asScala.map(fromHiveColumn) val schema = StructType(h.getCols.asScala.map(fromHiveColumn) ++ partCols) {code} It's not a problem before we have the unified "create table" syntax, because in the old create hive table syntax we have to specify the normal columns and partition columns separately, e.g. {{create table t1 (id int, name string) partitioned by (dept string)}} . Now that we can create partitioned table using hive format, e.g. {{create table t1 (id int, name string, dept string) using hive partitioned by (name)}}, the partition column may not be the last columns, so I think we need to reorder the schema so the partition columns would be the last ones. This is consistent with data source tables, e.g. {code} scala> sql("create table t1 (id int, name string, dept string) using parquet partitioned by (name)") scala> spark.table("t1").schema.fields.map(_.name) res44: Array[String] = Array(id, dept, name) {code} [~cloud_fan] Does this sound good to you? > DataFrameWriter.saveAsTable should work with hive format to create > partitioned table > ------------------------------------------------------------------------------------ > > Key: SPARK-19153 > URL: https://issues.apache.org/jira/browse/SPARK-19153 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org