Re: cartesian on pyspark not paralleised

2014-12-06 Thread Akhil Das
You could try increasing the level of parallelism (spark.default.parallelism) while creating the sparkContext Thanks Best Regards On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I

cartesian on pyspark not paralleised

2014-12-05 Thread Antony Mayi
Hi, using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I can seen multiple python processes spawned on each nodemanager but from some reason when running cartesian there is only single python process running on each node. the task is indicating thousands of partitions