almost all your data is going to one task. You can see that the shuffle read for task 0 is 153.3 KB, and for most other tasks its just 26B (which is probably just some header saying there are no actual records). You need to ensure your data is more evenly distributed before this step.
On Thu, Feb 19, 2015 at 10:53 AM, jatinpreet <jatinpr...@gmail.com> wrote: > Hi, > > I am running Spark 1.2.1 for compute intensive jobs comprising of multiple > tasks. I have observed that most tasks complete very quickly, but there are > always one or two tasks that take a lot of time to complete thereby > increasing the overall stage time. What could be the reason for this? > > Following are the statistics for one such stage. As you can see, the task > with index 0 takes 1.1 minutes whereas others completed much more quickly. > > Aggregated Metrics by Executor > Executor ID Address Task Time Total Tasks Failed > Tasks Succeeded Tasks > Input Output Shuffle Read Shuffle Write Shuffle Spill (Memory) > Shuffle > Spill (Disk) > 0 slave1:56311 46 s 13 0 13 0.0 B 0.0 B > 0.0 B 0.0 B 0.0 B 0.0 B > 1 slave2:42648 2.1 min 13 0 13 0.0 B > 0.0 B 384.3 KB 0.0 B 0.0 B > 0.0 B > 2 slave3:44322 23 s 12 0 12 0.0 B 0.0 B > 136.4 KB 0.0 B 0.0 B 0.0 > B > 3 slave4:37987 44 s 12 0 12 0.0 B 0.0 B > 213.9 KB 0.0 B 0.0 B 0.0 > B > Tasks > Index ID Attempt Status Locality Level Executor ID / Host > Launch Time > Duration GC Time Shuffle Read Errors > 0 213 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 1.1 min > 1 s 153.3 KB > 5 218 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 23 ms > 26.0 B > 1 214 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 2 s 0.9 > s 13.8 KB > 4 217 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 26 ms > 26.0 B > 3 216 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 11 ms > 0.0 B > 2 215 0 SUCCESS PROCESS_LOCAL 2 / slave3 > 2015/02/19 11:40:05 27 ms > 26.0 B > 7 220 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 11 ms > 0.0 B > 10 223 0 SUCCESS PROCESS_LOCAL 2 / slave3 > 2015/02/19 11:40:05 23 ms > 26.0 B > 6 219 0 SUCCESS PROCESS_LOCAL 2 / slave3 > 2015/02/19 11:40:05 23 ms > 26.0 B > 9 222 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 23 ms > 26.0 B > 8 221 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 23 ms > 26.0 B > 11 224 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 10 ms > 0.0 B > 14 227 0 SUCCESS PROCESS_LOCAL 2 / slave3 > 2015/02/19 11:40:05 24 ms > 26.0 B > 13 226 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 23 ms > 26.0 B > 16 229 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 22 ms > 26.0 B > 12 225 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 22 ms > 26.0 B > 15 228 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 10 ms > 0.0 B > 17 230 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 22 ms > 26.0 B > 23 236 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 10 ms > 0.0 B > 22 235 0 SUCCESS PROCESS_LOCAL 2 / slave3 > 2015/02/19 11:40:05 21 ms > 26.0 B > 19 232 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 10 ms > 0.0 B > 21 234 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 25 ms > 26.0 B > 18 231 0 SUCCESS PROCESS_LOCAL 2 / slave3 > 2015/02/19 11:40:05 24 ms > 26.0 B > 20 233 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 28 ms > 26.0 B > 25 238 0 SUCCESS PROCESS_LOCAL 3 / slave4 > 2015/02/19 11:40:05 20 ms > 26.0 B > 28 241 0 SUCCESS PROCESS_LOCAL 1 / slave2 > 2015/02/19 11:40:05 27 ms > 26.0 B > 27 240 0 SUCCESS PROCESS_LOCAL 0 / slave1 > 2015/02/19 11:40:05 10 ms > 0.0 B > > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Some-tasks-taking-too-much-time-to-complete-in-a-stage-tp21724.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >