Re: What is correct behavior for spark.task.maxFailures?

Dongjin Lee Mon, 24 Apr 2017 08:05:06 -0700

Sumit,

I think the post below is describing the very case of you.


https://blog.cloudera.com/blog/2017/04/blacklisting-in-apache-spark/

Regards,
Dongjin

--
Dongjin Lee

Software developer in Line+.
So interested in massive-scale machine learning.

facebook: http://www.facebook.com/dongjin.lee.kr
linkedin: http://kr.linkedin.com/in/dongjinleekr
github: http://github.com/dongjinleekr
twitter: http://www.twitter.com/dongjinleekr

On 22 Apr 2017, 5:32 AM +0900, Chawla,Sumit <sumitkcha...@gmail.com>, wrote:
> I am seeing a strange issue. I had a bad behaving slave that failed the 
> entire job. I have set spark.task.maxFailures to 8 for my job. Seems like all 
> task retries happen on the same slave in case of failure. My expectation was 
> that task will be retried on different slave in case of failure, and chance 
> of all 8 retries to happen on same slave is very less.
>
>
> Regards
> Sumit Chawla
>

Re: What is correct behavior for spark.task.maxFailures?

Reply via email to