Thanks All, I know i have a data skew but the data is unpredictable and hard to find every time. Do you think this workaround is reasonable?
ExecutorService executor = Executors.newCachedThreadPool(); Callable< Result > task = () -> simulation.run(); Future<Result> future = executor.submit(task); try { simResult = future.get(20, TimeUnit.MINUTES); } catch (TimeoutException ex) { SPARKLOG.info("Task timed out"); } It will force timeout the task if it runs for more than 20mins. On Thu, Jun 16, 2016 at 5:00 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I'd check Details for Stage page in web UI. > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jun 16, 2016 at 6:45 AM, Utkarsh Sengar <utkarsh2...@gmail.com> > wrote: > > This SO question was asked about 1yr ago. > > > http://stackoverflow.com/questions/31799755/how-to-deal-with-tasks-running-too-long-comparing-to-others-in-job-in-yarn-cli > > > > I answered this question with a suggestion to try speculation but it > doesn't > > quite do what the OP expects. I have been running into this issue more > these > > days. Out of 5000 tasks, 4950 completes in 5mins but the last 50 never > > really completes, have tried waiting for 4hrs. This can be a memory > issue or > > maybe the way spark's fine grained mode works with mesos, I am trying to > > enable jmxsink to get a heap dump. > > > > But in the mean time, is there a better fix for this? (in any version of > > spark, I am using 1.5.1 but can upgrade). It would be great if the last > 50 > > tasks in my example can be killed (timed out) and the stage completes > > successfully. > > > > -- > > Thanks, > > -Utkarsh > -- Thanks, -Utkarsh