[ https://issues.apache.org/jira/browse/SPARK-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070862#comment-15070862 ]
Furcy Pin commented on SPARK-9600: ---------------------------------- You are right that I forgot to wipe the metastore and warehouse between the two runs. However doing so get me the same error : I updated the code to make sure that each run start with a fresh new warehouse and metastore : {code} import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.hive.HiveContext import scala.reflect.io.Path object Spark9600 { case class Id(id: Int) def main(args: Array[String]): Unit = { Path("metastore_db").deleteRecursively() Path("warehouse").deleteRecursively() val conf: SparkConf = new SparkConf().setMaster("local[2]").setAppName("name").setSparkHome("sparkHome") val sc = new SparkContext(conf) val hc = new HiveContext(sc) hc.setConf("hive.metastore.warehouse.dir",s"file://${Path("warehouse").toAbsolute}") hc.sql("CREATE TABLE t1 (id INT)") val df = hc.createDataFrame(sc.parallelize(Seq(Id(1),Id(2),Id(3)))) df.write.insertInto("t1") df.write.saveAsTable("t2") } } {code} And I get exactly the same error: {code}Mkdirs failed to create file:/user/hive/warehouse/t2/_temporary/0/_temporary/attempt_201512241135_0001_m_000000_0 {code} How do you explain that the instruction {{hc.sql("CREATE TABLE t1 (id INT)")}} works and that {{df.write.saveAsTable("t2")}} doesn't? > DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse" > ------------------------------------------------------------------------ > > Key: SPARK-9600 > URL: https://issues.apache.org/jira/browse/SPARK-9600 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1, 1.5.0 > Reporter: Cheng Lian > Assignee: Sudhakar Thota > Priority: Blocker > Attachments: SPARK-9600-fl1.txt > > > Get a clean Spark 1.4.1 build: > {noformat} > $ git checkout v1.4.1 > $ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 > clean assembly/assembly > {noformat} > Stop any running local Hadoop instance and unset all Hadoop environment > variables, so that we force Spark run with local file system only: > {noformat} > $ unset HADOOP_CONF_DIR > $ unset HADOOP_PREFIX > $ unset HADOOP_LIBEXEC_DIR > $ unset HADOOP_CLASSPATH > {noformat} > In this way we also ensure that the default Hive warehouse location points to > local file system {{file:///user/hive/warehouse}}. Now we create warehouse > directories for testing: > {noformat} > $ sudo rm -rf /user # !! WARNING: IT'S /user RATHER THAN /usr !! > $ sudo mkdir -p /user/hive/{warehouse,warehouse_hive13} > $ sudo chown -R lian:staff /user > $ tree /user > /user > └── hive > ├── warehouse > └── warehouse_hive13 > {noformat} > Create a minimal {{hive-site.xml}}, only override the warehouse location, put > it under {{$SPARK_HOME/conf}}: > {noformat} > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > <configuration> > <property> > <name>hive.metastore.warehouse.dir</name> > <value>file:///user/hive/warehouse_hive13</value> > </property> > </configuration> > {noformat} > Now run our test snippets with {{pyspark}}: > {noformat} > $ ./bin/pyspark > In [1]: sqlContext.range(10).coalesce(1).write.saveAsTable("ds") > {noformat} > Check warehouse directories: > {noformat} > $ tree /user > /user > └── hive > ├── warehouse > │ └── ds > │ ├── _SUCCESS > │ ├── _common_metadata > │ ├── _metadata > │ └── part-r-00000-46e4b32a-5c4d-4dba-b8d6-8d30ae910dc9.gz.parquet > └── warehouse_hive13 > └── ds > {noformat} > Here you may notice the weird part: we have {{ds}} under both {{warehouse}} > and {{warehouse_hive13}}, but data are only written into the former. > Now let's try HiveQl: > {noformat} > In [2]: sqlContext.range(10).coalesce(1).registerTempTable("t") > In [3]: sqlContext.sql("CREATE TABLE ds_ctas AS SELECT * FROM t") > {noformat} > Check the directories again: > {noformat} > $ tree /user > /user > └── hive > ├── warehouse > │ └── ds > │ ├── _SUCCESS > │ ├── _common_metadata > │ ├── _metadata > │ └── part-r-00000-46e4b32a-5c4d-4dba-b8d6-8d30ae910dc9.gz.parquet > └── warehouse_hive13 > ├── ds > └── ds_ctas > ├── _SUCCESS > └── part-00000 > {noformat} > So HiveQl works fine. (Hive never writes Parquet summary files, so > {{_common_metadata}} and {{_metadata}} are missing in {{ds_ctas}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org