[ https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267576#comment-15267576 ]
Xin Wu edited comment on SPARK-14927 at 5/2/16 10:01 PM: --------------------------------------------------------- right now, when a datasource table is created with partition, it is not a hive compatiable table. So maybe need to create the table like {code}create table tmp.tmp1 (val string) partitioned by (year int) stored as parquet location '....' {code} Then insert into the table with a temp table that is derived from the dataframe. Something I tried below {code} scala> df.show +----+---+ |year|val| +----+---+ |2012| a| |2013| b| |2014| c| +----+---+ scala> val df1 = spark.sql("select * from t000 where year = 2012") df1: org.apache.spark.sql.DataFrame = [year: int, val: string] scala> df1.registerTempTable("df1") scala> spark.sql("insert into tmp.ptest3 partition(year=2012) select * from df1") scala> val df2 = spark.sql("select * from t000 where year = 2013") df2: org.apache.spark.sql.DataFrame = [year: int, val: string] scala> df2.registerTempTable("df2") scala> spark.sql("insert into tmp.ptest3 partition(year=2013) select val from df2") 16/05/02 14:47:34 WARN log: Updating partition stats fast for: ptest3 16/05/02 14:47:34 WARN log: Updated size to 327 res54: org.apache.spark.sql.DataFrame = [] scala> spark.sql("show partitions tmp.ptest3").show +---------+ | result| +---------+ |year=2012| |year=2013| +---------+ {code} This is a bit hacky though. There should be a better solution for your problem. And this is on spark 2.0. Try if 1.6 can take this. was (Author: xwu0226): right now, when a datasource table is created with partition, it is not a hive compatiable table. So maybe need to create the table like {code}create table tmp.tmp1 (val string) partitioned by (year int) stored as parquet location '....' {code} Then insert into the table with a temp table that is derived from the dataframe. Something I tried below {code} scala> df.show +----+---+ |year|val| +----+---+ |2012| a| |2013| b| |2014| c| +----+---+ scala> val df1 = spark.sql("select * from t000 where year = 2012") df1: org.apache.spark.sql.DataFrame = [year: int, val: string] scala> df1.registerTempTable("df1") scala> spark.sql("insert into tmp.ptest3 partition(year=2012) select * from df1") scala> val df2 = spark.sql("select * from t000 where year = 2013") df2: org.apache.spark.sql.DataFrame = [year: int, val: string] scala> df2.registerTempTable("df2") scala> spark.sql("insert into tmp.ptest3 partition(year=2013) select val from df2") 16/05/02 14:47:34 WARN log: Updating partition stats fast for: ptest3 16/05/02 14:47:34 WARN log: Updated size to 327 res54: org.apache.spark.sql.DataFrame = [] scala> spark.sql("show partitions tmp.ptest3").show +---------+ | result| +---------+ |year=2012| |year=2013| +---------+ {code} This is a bit hacky though. hope someone has a better solution for your problem. And this is on spark 2.0. Try if 1.6 can take this. > DataFrame. saveAsTable creates RDD partitions but not Hive partitions > --------------------------------------------------------------------- > > Key: SPARK-14927 > URL: https://issues.apache.org/jira/browse/SPARK-14927 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2, 1.6.1 > Environment: Mac OS X 10.11.4 local > Reporter: Sasha Ovsankin > > This is a followup to > http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive > . I tried to use suggestions in the answers but couldn't make it to work in > Spark 1.6.1 > I am trying to create partitions programmatically from `DataFrame. Here is > the relevant code (adapted from a Spark test): > hc.setConf("hive.metastore.warehouse.dir", "tmp/tests") > // hc.setConf("hive.exec.dynamic.partition", "true") > // hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict") > hc.sql("create database if not exists tmp") > hc.sql("drop table if exists tmp.partitiontest1") > Seq(2012 -> "a").toDF("year", "val") > .write > .partitionBy("year") > .mode(SaveMode.Append) > .saveAsTable("tmp.partitiontest1") > hc.sql("show partitions tmp.partitiontest1").show > Full file is here: > https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a > I get the error that the table is not partitioned: > ====================== > HIVE FAILURE OUTPUT > ====================== > SET hive.support.sql11.reserved.keywords=false > SET hive.metastore.warehouse.dir=tmp/tests > OK > OK > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a > partitioned table > ====================== > It looks like the root cause is that > `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable` > always creates table with empty partitions. > Any help to move this forward is appreciated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org