Oh, by default it's set to 0L. I'll try setting it to 30000 immediately. Thanks for the help!
Jianshi On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Thanks Shixiong! > > Very strange that our tasks were retried on the same executor again and > again. I'll check spark.scheduler.executorTaskBlacklistTime. > > Jianshi > > On Mon, Mar 16, 2015 at 6:02 PM, Shixiong Zhu <zsxw...@gmail.com> wrote: > >> There are 2 cases for "No space left on device": >> >> 1. Some tasks which use large temp space cannot run in any node. >> 2. The free space of datanodes is not balance. Some tasks which use large >> temp space can not run in several nodes, but they can run in other nodes >> successfully. >> >> Because most of our cases are the second one, we set >> "spark.scheduler.executorTaskBlacklistTime" to 30000 to solve such "No >> space left on device" errors. So if a task runs unsuccessfully in some >> executor, it won't be scheduled to the same executor in 30 seconds. >> >> >> Best Regards, >> Shixiong Zhu >> >> 2015-03-16 17:40 GMT+08:00 Jianshi Huang <jianshi.hu...@gmail.com>: >> >>> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353 >>> >>> >>> On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> We're facing "No space left on device" errors lately from time to time. >>>> The job will fail after retries. Obvious in such case, retry won't be >>>> helpful. >>>> >>>> Sure it's the problem in the datanodes but I'm wondering if Spark >>>> Driver can handle it and decommission the problematic datanode before >>>> retrying it. And maybe dynamically allocate another datanode if dynamic >>>> allocation is enabled. >>>> >>>> I think there needs to be a class of fatal errors that can't be >>>> recovered with retries. And it's best Spark can handle it nicely. >>>> >>>> Thanks, >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/