You should be able to figure out the cause from the AM log. It sounds like
it could be SLIDER-1183. The fix for this issue also requires YARN-5999.
With the SLIDER-1183 fix by itself, it should stop the app from being
killed, but the AM will remain in a broken state.

On Wed, Sep 27, 2017 at 4:48 PM, David.Serafini <david.seraf...@target.com>
wrote:

> I'm seeing my slider jobs sometimes fail for no obvious reason.
> One hypothesis is that this happens when the resource manager is restarted
> (actually, when one of the 2 redundant RMs restarts).
>
> Is this expected behavior?
>
> The jobs don't always fail completely; sometimes, yarn will fail an
> attempt and start another one, and the job's containers will all restart
> and everything will be fine.  Sometimes some of the jobs that are running
> will have trouble and some won't.  I haven't figured out a pattern yet.
>
> Any insight would be appreciated.
>
> -david
>
>
>

Reply via email to