[
https://issues.apache.org/jira/browse/MESOS-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321176#comment-17321176
]
Charles Natali commented on MESOS-10196:
----------------------------------------
Hey [~934341445] , sorry for the delay.
I know it's been a while, but in case it's still an issue, I think the next
step would be to run the following command to see exactly what's going on - my
guess is that the agent is maybe not starting the right executor or something
like that:
{code}
strace -ttTf -p <agent PID> -o agent.strace
{code}
And attach {{agent.strace}} together with the agent logs. It could also be
useful to start the agent with {{GLOG_v=9}} environment variable to get
detailed logs.
> The task program runs successfully but the task status is failed
> -----------------------------------------------------------------
>
> Key: MESOS-10196
> URL: https://issues.apache.org/jira/browse/MESOS-10196
> Project: Mesos
> Issue Type: Bug
> Components: executor
> Affects Versions: 1.9.0, 1.10.0
> Environment: Ubuntu 16.04
> mesos master 1.10.0
> mesos slave 1.9.0
> python 3.7.3
> Reporter: clancyhuang
> Priority: Major
>
> When testing mesos to execute the task by default executor, I found that the
> task status is failed but in fact the task was executed successfully.I tested
> two shell scripts, one is very simple
> {code:sh}
> python -V > /root/test.txt
> {code}
> ,The other is a script about image processing.
> I am sure they are all working properly, but I get an
> error:REASON_EXECUTOR_TERMINATED.
> The stderr of the task has no output, and the stdout is correct,the mesos
> agent has such log output
> {code:bash}
> I1104 11:34:35.337236 35682 slave.cpp:3657] Launching container
> 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef for executor 'default' of framework
> d915071b-c275-4321-afd5-134b86ebadf3-0002
> I1104 11:34:35.337371 35685 containerizer.cpp:1396] Starting container
> 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef
> I1104 11:34:35.337563 35685 containerizer.cpp:3323] Transitioning the state
> of container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef from STARTING to
> PROVISIONING after 76800ns
> I1104 11:34:35.338893 35685 containerizer.cpp:3323] Transitioning the state
> of container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef from PROVISIONING to
> PREPARING after 1.321216ms
> I1104 11:34:35.340224 35703 switchboard.cpp:316] Container logger module
> finished preparing container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef;
> IOSwitchboard server is not required
> I1104 11:34:35.341944 35707 linux_launcher.cpp:492] Launching container
> 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef and cloning with namespaces
> I1104 11:34:35.346983 35704 containerizer.cpp:3323] Transitioning the state
> of container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef from PREPARING to ISOLATING
> after 8.082944ms
> I1104 11:34:35.347719 35704 containerizer.cpp:3323] Transitioning the state
> of container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef from ISOLATING to FETCHING
> after 730880ns
> I1104 11:34:35.348254 35737 containerizer.cpp:3323] Transitioning the state
> of container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef from FETCHING to RUNNING
> after 539136ns
> I1104 11:34:58.060906 35680 slave.cpp:7406] Current disk usage 73.86%. Max
> allowed age: 1.130070981247558days
> I1104 11:35:58.062266 35708 slave.cpp:7406] Current disk usage 73.86%. Max
> allowed age: 1.129549109651991days
> I1104 11:36:58.062948 35741 slave.cpp:7406] Current disk usage 73.87%. Max
> allowed age: 1.129005310066273days
> I1104 11:37:58.063513 35703 slave.cpp:7406] Current disk usage 73.88%. Max
> allowed age: 1.128437717518472days
> I1104 11:38:30.242969 35740 containerizer.cpp:3161] Container
> 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef has exited
> I1104 11:38:30.243052 35740 containerizer.cpp:2620] Destroying container
> 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef in RUNNING state
> I1104 11:38:30.243072 35740 containerizer.cpp:3323] Transitioning the state
> of container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef from RUNNING to DESTROYING
> after 3.91491408213333mins
> I1104 11:38:30.243252 35672 linux_launcher.cpp:576] Asked to destroy
> container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef
> I1104 11:38:30.243350 35672 linux_launcher.cpp:618] Destroying cgroup
> '/sys/fs/cgroup/freezer/mesos/65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef'
> I1104 11:38:30.243768 35679 cgroups.cpp:2854] Freezing cgroup
> /sys/fs/cgroup/freezer/mesos/65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef
> I1104 11:38:30.243961 35671 cgroups.cpp:1242] Successfully froze cgroup
> /sys/fs/cgroup/freezer/mesos/65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef after
> 110848ns
> I1104 11:38:30.244160 35683 cgroups.cpp:2872] Thawing cgroup
> /sys/fs/cgroup/freezer/mesos/65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef
> I1104 11:38:30.244272 35683 cgroups.cpp:1271] Successfully thawed cgroup
> /sys/fs/cgroup/freezer/mesos/65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef after
> 67840ns
> I1104 11:38:30.244668 35690 linux_launcher.cpp:650] Destroying cgroup
> '/sys/fs/cgroup/systemd/mesos/65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef'
> I1104 11:38:30.245975 35726 slave.cpp:6856] Executor 'default' of framework
> d915071b-c275-4321-afd5-134b86ebadf3-0002 exited with status 0
> I1104 11:38:30.246995 35726 slave.cpp:5737] Handling status update
> TASK_FAILED (Status UUID: 03cbd06b-cad8-4624-bf65-36a1a83ea39e) for task
> onextest1 of framework d915071b-c275-4321- afd5-134b86ebadf3-0002
> from @0.0.0.0:0
> W1104 11:38:30.247347 35727 containerizer.cpp:2419] Ignoring update for
> unknown container 65c98d6f-fcf8-4be4-89a8-7fe53b5c30ef
> I1104 11:38:30.247728 35730 task_status_update_manager.cpp:328] Received task
> status update TASK_FAILED (Status UUID: 03cbd06b-cad8-4624-bf65-36a1a83ea39e)
> for task onextest1 of fram ework
> d915071b-c275-4321-afd5-134b86ebadf3-0002
> I1104 11:38:30.248106 35731 slave.cpp:6276] Forwarding the update TASK_FAILED
> (Status UUID: 03cbd06b-cad8-4624-bf65-36a1a83ea39e) for task onextest1 of
> framework d915071b-c275-4321-a fd5-134b86ebadf3-0002 to
> master@master:5050
> I1104 11:38:30.355559 35735 task_status_update_manager.cpp:401] Received task
> status update acknowledgement (UUID: 03cbd06b-cad8-4624-bf65-36a1a83ea39e)
> for task onextest1 of framewo rk
> d915071b-c275-4321-afd5-134b86ebadf3-0002
> I1104 11:38:30.355821 35732 slave.cpp:6967] Cleaning up executor 'default' of
> framework d915071b-c275-4321-afd5-134b86ebadf3-0002
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)