Re: coalesce serialising earlier work

2016-08-09 Thread ayan guha
How about running a count step to force spark to materialise data frame and then repartition to 1? On 9 Aug 2016 17:11, "Adrian Bridgett" wrote: > In short: df.coalesce(1).write seems to make all the earlier calculations > about the dataframe go through a single task

coalesce serialising earlier work

2016-08-09 Thread Adrian Bridgett
In short: df.coalesce(1).write seems to make all the earlier calculations about the dataframe go through a single task (rather than on multiple tasks and then the final dataframe to be sent through a single worker). Any idea how we can force the job to run in parallel? In more detail: We