Oh, by default it's set to 0L.

I'll try setting it to 30000 immediately. Thanks for the help!

Jianshi

On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Thanks Shixiong!
>
> Very strange that our tasks were retried on the same executor again and
> again. I'll check spark.scheduler.executorTaskBlacklistTime.
>
> Jianshi
>
> On Mon, Mar 16, 2015 at 6:02 PM, Shixiong Zhu <zsxw...@gmail.com> wrote:
>
>> There are 2 cases for "No space left on device":
>>
>> 1. Some tasks which use large temp space cannot run in any node.
>> 2. The free space of datanodes is not balance. Some tasks which use large
>> temp space can not run in several nodes, but they can run in other nodes
>> successfully.
>>
>> Because most of our cases are the second one, we set
>> "spark.scheduler.executorTaskBlacklistTime" to 30000 to solve such "No
>> space left on device" errors. So if a task runs unsuccessfully in some
>> executor, it won't be scheduled to the same executor in 30 seconds.
>>
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2015-03-16 17:40 GMT+08:00 Jianshi Huang <jianshi.hu...@gmail.com>:
>>
>>> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353
>>>
>>>
>>> On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang <jianshi.hu...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We're facing "No space left on device" errors lately from time to time.
>>>> The job will fail after retries. Obvious in such case, retry won't be
>>>> helpful.
>>>>
>>>> Sure it's the problem in the datanodes but I'm wondering if Spark
>>>> Driver can handle it and decommission the problematic datanode before
>>>> retrying it. And maybe dynamically allocate another datanode if dynamic
>>>> allocation is enabled.
>>>>
>>>> I think there needs to be a class of fatal errors that can't be
>>>> recovered with retries. And it's best Spark can handle it nicely.
>>>>
>>>> Thanks,
>>>> --
>>>> Jianshi Huang
>>>>
>>>> LinkedIn: jianshi
>>>> Twitter: @jshuang
>>>> Github & Blog: http://huangjs.github.com/
>>>>
>>>
>>>
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to