almost all your data is going to one task.  You can see that the shuffle
read for task 0 is 153.3 KB, and for most other tasks its just 26B (which
is probably just some header saying there are no actual records).  You need
to ensure your data is more evenly distributed before this step.

On Thu, Feb 19, 2015 at 10:53 AM, jatinpreet <jatinpr...@gmail.com> wrote:

> Hi,
>
> I am running Spark 1.2.1 for compute intensive jobs comprising of multiple
> tasks. I have observed that most tasks complete very quickly, but there are
> always one or two tasks that take a lot of time to complete thereby
> increasing the overall stage time. What could be the reason for this?
>
> Following are the statistics for one such stage. As you can see, the task
> with index 0 takes 1.1 minutes whereas others completed much more quickly.
>
> Aggregated Metrics by Executor
> Executor ID     Address         Task Time       Total Tasks     Failed
> Tasks    Succeeded Tasks
> Input   Output  Shuffle Read    Shuffle Write   Shuffle Spill (Memory)
> Shuffle
> Spill (Disk)
> 0       slave1:56311    46 s    13      0       13      0.0 B   0.0 B
>  0.0 B   0.0 B   0.0 B   0.0 B
> 1       slave2:42648    2.1 min         13      0       13      0.0 B
>  0.0 B   384.3 KB        0.0 B   0.0 B
> 0.0 B
> 2       slave3:44322    23 s    12      0       12      0.0 B   0.0 B
>  136.4 KB        0.0 B   0.0 B   0.0
> B
> 3       slave4:37987    44 s    12      0       12      0.0 B   0.0 B
>  213.9 KB        0.0 B   0.0 B   0.0
> B
> Tasks
> Index   ID      Attempt Status  Locality Level  Executor ID / Host
> Launch Time
> Duration        GC Time Shuffle Read    Errors
> 0       213     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     1.1 min
> 1 s     153.3 KB
> 5       218     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     23 ms
> 26.0 B
> 1       214     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     2 s     0.9
> s       13.8 KB
> 4       217     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     26 ms
> 26.0 B
> 3       216     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     11 ms
> 0.0 B
> 2       215     0       SUCCESS         PROCESS_LOCAL   2 / slave3
> 2015/02/19 11:40:05     27 ms
> 26.0 B
> 7       220     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     11 ms
> 0.0 B
> 10      223     0       SUCCESS         PROCESS_LOCAL   2 / slave3
> 2015/02/19 11:40:05     23 ms
> 26.0 B
> 6       219     0       SUCCESS         PROCESS_LOCAL   2 / slave3
> 2015/02/19 11:40:05     23 ms
> 26.0 B
> 9       222     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     23 ms
> 26.0 B
> 8       221     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     23 ms
> 26.0 B
> 11      224     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     10 ms
> 0.0 B
> 14      227     0       SUCCESS         PROCESS_LOCAL   2 / slave3
> 2015/02/19 11:40:05     24 ms
> 26.0 B
> 13      226     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     23 ms
> 26.0 B
> 16      229     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     22 ms
> 26.0 B
> 12      225     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     22 ms
> 26.0 B
> 15      228     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     10 ms
> 0.0 B
> 17      230     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     22 ms
> 26.0 B
> 23      236     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     10 ms
> 0.0 B
> 22      235     0       SUCCESS         PROCESS_LOCAL   2 / slave3
> 2015/02/19 11:40:05     21 ms
> 26.0 B
> 19      232     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     10 ms
> 0.0 B
> 21      234     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     25 ms
> 26.0 B
> 18      231     0       SUCCESS         PROCESS_LOCAL   2 / slave3
> 2015/02/19 11:40:05     24 ms
> 26.0 B
> 20      233     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     28 ms
> 26.0 B
> 25      238     0       SUCCESS         PROCESS_LOCAL   3 / slave4
> 2015/02/19 11:40:05     20 ms
> 26.0 B
> 28      241     0       SUCCESS         PROCESS_LOCAL   1 / slave2
> 2015/02/19 11:40:05     27 ms
> 26.0 B
> 27      240     0       SUCCESS         PROCESS_LOCAL   0 / slave1
> 2015/02/19 11:40:05     10 ms
> 0.0 B
>
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Some-tasks-taking-too-much-time-to-complete-in-a-stage-tp21724.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to