Re: cartesian on pyspark not paralleised

2014-12-06 Thread Akhil Das
You could try increasing the level of parallelism
(spark.default.parallelism) while creating the sparkContext

Thanks
Best Regards

On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi antonym...@yahoo.com.invalid
wrote:

 Hi,

 using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel -
 I can seen multiple python processes spawned on each nodemanager but from
 some reason when running cartesian there is only single python process
 running on each node. the task is indicating thousands of partitions so
 don't understand why it is not running with higher parallelism. the
 performance is obviously poor although other operation rocks.

 any idea how to improve this?

 thank you,
 Antony.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




cartesian on pyspark not paralleised

2014-12-05 Thread Antony Mayi
Hi,

using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I 
can seen multiple python processes spawned on each nodemanager but from some 
reason when running cartesian there is only single python process running on 
each node. the task is indicating thousands of partitions so don't understand 
why it is not running with higher parallelism. the performance is obviously 
poor although other operation rocks.

any idea how to improve this?

thank you,
Antony.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org