slider job fails when resourcemanager restarts

David . Serafini Wed, 27 Sep 2017 16:49:01 -0700

I'm seeing my slider jobs sometimes fail for no obvious reason.
One hypothesis is that this happens when the resource manager is restarted 
(actually, when one of the 2 redundant RMs restarts).


Is this expected behavior?   

The jobs don't always fail completely; sometimes, yarn will fail an attempt and 
start another one, and the job's containers will all restart and everything 
will be fine.  Sometimes some of the jobs that are running will have trouble 
and some won't.  I haven't figured out a pattern yet.

Any insight would be appreciated.

-david

slider job fails when resourcemanager restarts

Reply via email to