You could try increasing the level of parallelism
(spark.default.parallelism) while creating the sparkContext
Thanks
Best Regards
On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel -
I can seen multiple python processes spawned on each nodemanager but from
some reason when running cartesian there is only single python process
running on each node. the task is indicating thousands of partitions so
don't understand why it is not running with higher parallelism. the
performance is obviously poor although other operation rocks.
any idea how to improve this?
thank you,
Antony.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org