Hi I am facing a peculiar issue on one of the slave nodes of our cluster. I have a spark cluster with 40+ nodes. On one of the nodes, all tasks fail with exit code 0.
ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76 exited caused by one of the running tasks) Reason: Unknown executor exit code (0) I cannot seem to find anything in mesos-slave.logs, and there is nothing being written to stdout/stderr. Are there any debugging utitlities that i can use to debug what can be getting wrong on that particular slave? I tried running following but got stuck at: /mesos-containerizer launch --command='{"environment":{},"shell":true,"value":"ls -ltr"}' --directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S77/frameworks/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-0312/executors/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S77/runs/45aa784c-f485-46a6-aeb8-997e82b80c4f --help=false --pipe_read=0 --pipe_write=0 --user=smi Failed to synchronize with slave (it's probably exited) Would apprecite pointing to any debugging methods/documentation to diagnose these kind of problems. Regards Sumit Chawla