another option will be to try
rdd.toLocalIterator()
not sure if it will help though
I had same problem and ended up to move all parts to local disk(with Hadoop
FileSystem api) and then processing them locally
On 5 January 2016 at 22:08, Alexander Pivovarov
wrote:
> try
try coalesce(1, true).
On Tue, Jan 5, 2016 at 11:58 AM, unk1102 wrote:
> hi I am trying to save many partitions of Dataframe into one CSV file and
> it
> take forever for large data sets of around 5-6 GB.
>
>
>
Hi dataframe has not boolean option for coalesce it is only for RDD I
believe
sourceFrame.coalesce(1,true) //gives compilation error
On Wed, Jan 6, 2016 at 1:38 AM, Alexander Pivovarov
wrote:
> try coalesce(1, true).
>
> On Tue, Jan 5, 2016 at 11:58 AM, unk1102
Hi Unk1102
I also had trouble when I used coalesce(). Reparation() worked much better.
Keep in mind if you have a large number of portions you are probably going
have high communication costs.
Also my code works a lot better on 1.6.0. DataFrame memory was not be
spilled in 1.5.2. In 1.6.0