subject:"Re\: coalesce\(1\).saveAsTextfile\(\) takes forever\?"

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Igor Berman

another option will be to try rdd.toLocalIterator() not sure if it will help though I had same problem and ended up to move all parts to local disk(with Hadoop FileSystem api) and then processing them locally On 5 January 2016 at 22:08, Alexander Pivovarov wrote: > try

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Alexander Pivovarov

try coalesce(1, true). On Tue, Jan 5, 2016 at 11:58 AM, unk1102 wrote: > hi I am trying to save many partitions of Dataframe into one CSV file and > it > take forever for large data sets of around 5-6 GB. > > >

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Umesh Kacha

Hi dataframe has not boolean option for coalesce it is only for RDD I believe sourceFrame.coalesce(1,true) //gives compilation error On Wed, Jan 6, 2016 at 1:38 AM, Alexander Pivovarov wrote: > try coalesce(1, true). > > On Tue, Jan 5, 2016 at 11:58 AM, unk1102

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Andy Davidson

Hi Unk1102 I also had trouble when I used coalesce(). Reparation() worked much better. Keep in mind if you have a large number of portions you are probably going have high communication costs. Also my code works a lot better on 1.6.0. DataFrame memory was not be spilled in 1.5.2. In 1.6.0