Didn’t seem to help:

conf = SparkConf().set("spark.shuffle.spill", 
"false").set("spark.default.parallelism", "12")
sc = SparkContext(appName=’app_name', conf = conf)

but still taking as much time

On 22.10.2014, at 14:17, Nicholas Chammas <nicholas.cham...@gmail.com> wrote:

> Total guess without knowing anything about your code: Do either of these two 
> notes from the 1.1.0 release notes affect things at all?
> 
> PySpark now performs external spilling during aggregations. Old behavior can 
> be restored by setting spark.shuffle.spill to false.
> PySpark uses a new heuristic for determining the parallelism of shuffle 
> operations. Old behavior can be restored by setting spark.default.parallelism 
> to the number of cores in the cluster.
> Nick
> 

Reply via email to