Hi I have Hive insert into query which creates new Hive partitions. I have two Hive partitions named server and date. Now I execute insert into queries using the following code and try to save it
DataFrame dframe = hiveContext.sql("insert into summary1 partition(server='a1',date='2015-05-22') select from sourcetbl bla bla") //above query creates orc file at /user/db/a1/20-05-22 // I want only one part-00000 file at the end of above query so I tried the following and none worked drame.coalesce(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); drame.repartition(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); drame.coalesce(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); drame.repartition(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); No matter I use coalesce or reparition above query creates around 200 files at the location /user/db/a1/20-05-22. I was thinking if I call coalesce(1) then it will create final one part file. Am I wrong? Please guide. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/dataFrame-colaesce-1-or-dataFrame-reapartition-1-does-not-seem-work-for-me-tp23769.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org