Didn’t seem to help: conf = SparkConf().set("spark.shuffle.spill", "false").set("spark.default.parallelism", "12") sc = SparkContext(appName=’app_name', conf = conf)
but still taking as much time On 22.10.2014, at 14:17, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Total guess without knowing anything about your code: Do either of these two > notes from the 1.1.0 release notes affect things at all? > > PySpark now performs external spilling during aggregations. Old behavior can > be restored by setting spark.shuffle.spill to false. > PySpark uses a new heuristic for determining the parallelism of shuffle > operations. Old behavior can be restored by setting spark.default.parallelism > to the number of cores in the cluster. > Nick >