Any Response? 2015-07-06 12:28 GMT+08:00 Tao Li <litao.bupt...@gmail.com>:
> > > Node cloud10141049104.wd.nm.nop.sogou-op.org and > cloud101417770.wd.nm.ss.nop.sogou-op.org failed too many times, I want to > know if it can be auto offline when failed too many times? > > 2015-07-06 12:25 GMT+08:00 Tao Li <litao.bupt...@gmail.com>: > >> I have a long live spark application running on YARN. >> >> In some nodes, it try to write to the shuffle path in the shuffle map >> task. But the root path /search/hadoop10/yarn_local/usercache/spark/ was >> deleted, so the task is failed. So every time when running shuffle map task >> on this node, it was always failed due to the root path not existed. >> >> I want to know if can set the executor max task failed num? If the task >> failed num exceed the threshold, we can let the exectuor offline and offer >> a new executor by driver? >> >> shuffle path : >> /search/hadoop10/yarn_local/usercache/spark/appcache/application_1434370929997_155180/spark-local-20150703120414-a376/0e/shuffle_20002_720_0.data >> > >