This SO question was asked about 1yr ago. http://stackoverflow.com/questions/31799755/how-to-deal-with-tasks-running-too-long-comparing-to-others-in-job-in-yarn-cli
I answered this question with a suggestion to try speculation but it doesn't quite do what the OP expects. I have been running into this issue more these days. Out of 5000 tasks, 4950 completes in 5mins but the last 50 never really completes, have tried waiting for 4hrs. This can be a memory issue or maybe the way spark's fine grained mode works with mesos, I am trying to enable jmxsink to get a heap dump. But in the mean time, is there a better fix for this? (in any version of spark, I am using 1.5.1 but can upgrade). It would be great if the last 50 tasks in my example can be killed (timed out) and the stage completes successfully. -- Thanks, -Utkarsh