On Tue, May 30, 2017 at 3:43 PM, Neil Conway <neil.con...@gmail.com> wrote:
> On Tue, May 30, 2017 at 2:36 PM, Vinod Kone <vinodk...@apache.org> wrote: > > Ran on ASF CI. > > > > Found following issues. > > > > Failed test: CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled > > <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos- > Release/36/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20-- > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1% > 20MESOS_VERBOSE=1,OS=centos:7,label_exp=(docker%7C%7CHadoop) > &&(!ubuntu-us1)&&(!ubuntu-eu2)/console> > > Attached is the test log for this failure. From a quick look, seems as > though the agent starts to launch the task, including forking the > child process, but no subsequent task status updates or error messages > are observed. Gaston, have you seen this before? > > I filed https://issues.apache.org/jira/browse/MESOS-7589 to track this. > Nope, I haven't seen this before. It looks like the executor wasn't launched or something went wrong at a very early stage of the executor's launch. This is what the logs from a successful run look like: I0530 20:19:52.679759 11282 launcher.cpp:140] Forked child with pid '11304' >> for container '703bded9-43de-4950-986f-0eaec7bbe664' > > I0530 20:19:52.692080 11268 fetcher.cpp:324] Starting to fetch URIs for >> container: 703bded9-43de-4950-986f-0eaec7bbe664, directory: >> /tmp/CommandExecutorCheckTest_CommandCheckDeliveredAndReconciled_fqE44y/slaves/7c74b451-3e64-4c95-8570-ad3b5b60962b-S0/frameworks/7c74b451-3e64-4c95-8570-ad3b5b60962b-0000/executors/55b387b7-c1f1-47d5-9f19-2d676cd4ef2e/runs/703bded9-43de-4950-986f-0eaec7bbe664 > > I0530 20:19:53.517201 11268 hierarchical.cpp:2095] Filtered offer with >> cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-S0 for role * of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 > > I0530 20:19:53.517283 11268 hierarchical.cpp:1861] No allocations performed > > I0530 20:19:53.517318 11268 hierarchical.cpp:1951] No inverse offers to >> send out! > > I0530 20:19:53.517367 11268 hierarchical.cpp:1445] Performed allocation >> for 1 agents in 654478ns > > I0530 20:19:54.518506 11277 hierarchical.cpp:2095] Filtered offer with >> cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-S0 for role * of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 > > I0530 20:19:54.518584 11277 hierarchical.cpp:1861] No allocations performed > > I0530 20:19:54.518616 11277 hierarchical.cpp:1951] No inverse offers to >> send out! > > I0530 20:19:54.518662 11277 hierarchical.cpp:1445] Performed allocation >> for 1 agents in 568919ns > > I0530 20:19:55.520282 11276 hierarchical.cpp:2095] Filtered offer with >> cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-S0 for role * of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 > > I0530 20:19:55.520364 11276 hierarchical.cpp:1861] No allocations performed > > I0530 20:19:55.520409 11276 hierarchical.cpp:1951] No inverse offers to >> send out! > > I0530 20:19:55.520463 11276 hierarchical.cpp:1445] Performed allocation >> for 1 agents in 621097ns > > I0530 20:19:56.521884 11271 hierarchical.cpp:2095] Filtered offer with >> cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-S0 for role * of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 > > I0530 20:19:56.521966 11271 hierarchical.cpp:1861] No allocations performed > > I0530 20:19:56.522001 11271 hierarchical.cpp:1951] No inverse offers to >> send out! > > I0530 20:19:56.522047 11271 hierarchical.cpp:1445] Performed allocation >> for 1 agents in 597601ns > > I0530 20:19:57.523320 11267 hierarchical.cpp:2095] Filtered offer with >> cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-S0 for role * of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 > > I0530 20:19:57.523398 11267 hierarchical.cpp:1861] No allocations performed > > I0530 20:19:57.523432 11267 hierarchical.cpp:1951] No inverse offers to >> send out! > > I0530 20:19:57.523478 11267 hierarchical.cpp:1445] Performed allocation >> for 1 agents in 573802ns > > I0530 20:19:57.641808 11358 exec.cpp:162] Version: 1.4.0 > > I0530 20:19:57.648437 11268 slave.cpp:3809] Got registration for executor >> '55b387b7-c1f1-47d5-9f19-2d676cd4ef2e' of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 from executor(1)@ >> 192.99.40.208:43797 > > I0530 20:19:57.650589 11279 slave.cpp:2526] Sending queued task >> '55b387b7-c1f1-47d5-9f19-2d676cd4ef2e' to executor >> '55b387b7-c1f1-47d5-9f19-2d676cd4ef2e' of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 at executor(1)@ >> 192.99.40.208:43797 > > I0530 20:19:57.650804 11356 exec.cpp:237] Executor registered on agent >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-S0 > > Received SUBSCRIBED event > > Subscribed executor on gru1.hw.ca1.mesosphere.com > > Received LAUNCH event > > Starting task 55b387b7-c1f1-47d5-9f19-2d676cd4ef2e > > Running '/home/gaston/mesos-custom/build/src/mesos-containerizer launch >> <POSSIBLY-SENSITIVE-DATA>' > > Forked command at 11361 > > I0530 20:19:57.672251 11273 slave.cpp:4279] Handling status update >> TASK_RUNNING (UUID: b2e0d7b6-0f15-41d6-ae59-7c915a639672) for task >> 55b387b7-c1f1-47d5-9f19-2d676cd4ef2e of framework >> 7c74b451-3e64-4c95-8570-ad3b5b60962b-0000 from executor(1)@ >> 192.99.40.208:43797 > > The line saying "Forked command at <pid>" is logged by the command executor before creating a checker, since it is missing from these logs, I believe that the failure is not related to checks. -Gastón