Hi

I am facing a peculiar issue on one of the slave nodes of our cluster.  I
have a spark cluster with 40+ nodes.  On one of the nodes, all tasks fail
with exit code 0.

ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76
exited caused by one of the running tasks) Reason: Unknown executor exit
code (0)


I cannot seem to find anything in mesos-slave.logs, and there is nothing
being written to stdout/stderr.  Are there any debugging utitlities that i
can use to debug what can be getting wrong on that particular slave?

I tried running following but got stuck at:


/mesos-containerizer launch
--command='{"environment":{},"shell":true,"value":"ls -ltr"}'
--directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S77/frameworks/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-0312/executors/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S77/runs/45aa784c-f485-46a6-aeb8-997e82b80c4f
--help=false --pipe_read=0 --pipe_write=0 --user=smi

Failed to synchronize with slave (it's probably exited)


Would apprecite pointing to any debugging methods/documentation to diagnose
these kind of problems.

Regards
Sumit Chawla

Reply via email to