You could try increasing the level of parallelism
(spark.default.parallelism) while creating the sparkContext
Thanks
Best Regards
On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel -
I
Hi,
using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I
can seen multiple python processes spawned on each nodemanager but from some
reason when running cartesian there is only single python process running on
each node. the task is indicating thousands of partitions