another option will be to try
rdd.toLocalIterator()
not sure if it will help though

I had same problem and ended up to move all parts to local disk(with Hadoop
FileSystem api) and then processing them locally


On 5 January 2016 at 22:08, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> try coalesce(1, true).
>
> On Tue, Jan 5, 2016 at 11:58 AM, unk1102 <umesh.ka...@gmail.com> wrote:
>
>> hi I am trying to save many partitions of Dataframe into one CSV file and
>> it
>> take forever for large data sets of around 5-6 GB.
>>
>>
>> sourceFrame.coalesce(1).write().format("com.databricks.spark.csv").option("gzip").save("/path/hadoop")
>>
>> For small data above code works well but for large data it hangs forever
>> does not move on because of only one partitions has to shuffle data of GBs
>> please help me
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to