How about running a count step to force spark to materialise data frame and
then repartition to 1?
On 9 Aug 2016 17:11, "Adrian Bridgett" wrote:
> In short: df.coalesce(1).write seems to make all the earlier calculations
> about the dataframe go through a single task
In short: df.coalesce(1).write seems to make all the earlier
calculations about the dataframe go through a single task (rather than
on multiple tasks and then the final dataframe to be sent through a
single worker). Any idea how we can force the job to run in parallel?
In more detail:
We