Hi Joseph The error code is being reported as 0, and there is not much else in the logs.
Regards Sumit Chawla On Wed, May 24, 2017 at 12:21 AM, Joseph Wu <jos...@mesosphere.io> wrote: > There isn't a tool for this. Can you check if the Mesos agent is being > restarted (or crashing) when you launch a task? And perhaps upload some > logs around the time of the task launch. > > There is a mismatch between the exit codes you've reported though. When > you see that log line in the sandbox logs, the exit code will be "1" > (failure), rather than "0" (success). > > On Mon, May 22, 2017 at 9:30 PM, Chawla,Sumit <sumitkcha...@gmail.com> > wrote: > >> Hi Joseph >> >> I am using 0.27.0. Is there any diagnosis tool or command line that i >> can run to ascertain that why its happening? >> >> Regards >> Sumit Chawla >> >> >> On Fri, May 19, 2017 at 2:31 PM, Joseph Wu <jos...@mesosphere.io> wrote: >> >>> What version of Mesos are you using? (Just based on the word "slave" in >>> that error message, I'm guessing 0.28 or older.) >>> >>> The "Failed to synchronize" error is something that can occur while the >>> agent is launching the executor. During the launch, the agent will create >>> a pipe to the executor subprocess; and the executor makes a blocking read >>> on this pipe. The agent will write a value to the pipe to signal the >>> executor to proceed. If the agent restarts or the pipe breaks at this >>> point in the launch, then you'll see this error message. >>> >>> On Thu, May 18, 2017 at 9:44 PM, Chawla,Sumit <sumitkcha...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> I am facing a peculiar issue on one of the slave nodes of our cluster. >>>> I have a spark cluster with 40+ nodes. On one of the nodes, all tasks fail >>>> with exit code 0. >>>> >>>> ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76 >>>> exited caused by one of the running tasks) Reason: Unknown executor >>>> exit code (0) >>>> >>>> >>>> I cannot seem to find anything in mesos-slave.logs, and there is >>>> nothing being written to stdout/stderr. Are there any debugging utitlities >>>> that i can use to debug what can be getting wrong on that particular slave? >>>> >>>> >>>> I tried running following but got stuck at: >>>> >>>> >>>> /mesos-containerizer launch >>>> --command='{"environment":{},"shell":true,"value":"ls >>>> -ltr"}' --directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-b6eb-8f >>>> a4d2539da7-S77/frameworks/e6745c67-32e8-41ad-b6eb-8fa4d2539d >>>> a7-0312/executors/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S77/ >>>> runs/45aa784c-f485-46a6-aeb8-997e82b80c4f --help=false --pipe_read=0 >>>> --pipe_write=0 --user=smi >>>> >>>> Failed to synchronize with slave (it's probably exited) >>>> >>>> >>>> Would apprecite pointing to any debugging methods/documentation to >>>> diagnose these kind of problems. >>>> >>>> Regards >>>> Sumit Chawla >>>> >>>> >>> >> >