Are you running on mesos, yarn or standalone? If you're on mesos, are you using coarse grain or fine grained mode?
On Thu, Aug 13, 2015 at 10:13 PM, Ara Vartanian <arav...@cs.wisc.edu> wrote: > I’m observing an unusual situation where my step duration increases as I > add further executors to my cluster. My algorithm is fully data > parallelizable into a map phase, followed by a reduce step at the end that > amounts to matrix addition. So I’ve kicked a cluster of, say, 100 executors > with 4 cores per executor and before running the algorithm I’ve > repartitioned the RDD into 400 partitions. I can see in the Spark UI that > each of the 400 (map) tasks takes about 2 seconds. However, the entire step > is taking over a minute, and this is because the launch times of the tasks > as reported in the Spark UI are staggered. For example, the first 100 might > be launched in the same second, then another group 3 seconds later, and so > forth (with the durations slowly expanding). With a task time of 2 seconds, > this “launch lag” is dominating the computation time and only gets worse as > I add nodes. > > Any insight on how I could go about diagnosing this would be greatly > appreciated. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >