Hi. If I just look at the two pics, I see that there is only one sub-task that takes all the time.. This is the flatmapToPair at Coef... line 52. I also see that there are only two partitions that make up the input and thus probably only two workers active.
Try repartitioning the data into more parts before line 52 by calling "rddname".repartition(10) for example and see if it runs faster.. Regards, Gylfi. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-same-execution-time-on-1-node-and-5-nodes-tp23866p23893.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org