Do you mean this setup?
https://spark.apache.org/docs/1.5.2/job-scheduling.html#dynamic-resource-allocation



On Wed, Feb 3, 2016 at 11:50 AM, Marcelo Vanzin <van...@cloudera.com> wrote:

> Without the exact error from the driver that caused the job to restart,
> it's hard to tell. But a simple way to improve things is to install the
> Spark shuffle service on the YARN nodes, so that even if an executor
> crashes, its shuffle output is still available to other executors.
>
> On Wed, Feb 3, 2016 at 11:46 AM, Nirav Patel <npa...@xactlycorp.com>
> wrote:
>
>> Hi,
>>
>> I have a spark job running on yarn-client mode. At some point during Join
>> stage, executor(container) runs out of memory and yarn kills it. Due to
>> this Entire job restarts! and it keeps doing it on every failure?
>>
>> What is the best way to checkpoint? I see there's checkpoint api and
>> other option might be to persist before Join stage. Would that prevent
>> retry of entire job? How about just retrying only the task that was
>> distributed to that faulty executor?
>>
>> Thanks
>>
>>
>>
>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>>
>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
>> <https://twitter.com/Xactly>  [image: Facebook]
>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>> <http://www.youtube.com/xactlycorporation>
>
>
>
>
> --
> Marcelo
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to