try coalesce(1, true). On Tue, Jan 5, 2016 at 11:58 AM, unk1102 <umesh.ka...@gmail.com> wrote:
> hi I am trying to save many partitions of Dataframe into one CSV file and > it > take forever for large data sets of around 5-6 GB. > > > sourceFrame.coalesce(1).write().format("com.databricks.spark.csv").option("gzip").save("/path/hadoop") > > For small data above code works well but for large data it hangs forever > does not move on because of only one partitions has to shuffle data of GBs > please help me > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >