Charles Allen created MESOS-8158:
------------------------------------

             Summary: Mesos Agent in docker neglects to retry discovering Task 
docker containers
                 Key: MESOS-8158
                 URL: https://issues.apache.org/jira/browse/MESOS-8158
             Project: Mesos
          Issue Type: Bug
          Components: agent, containerization, docker, executor
    Affects Versions: 1.4.0
         Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
            Reporter: Charles Allen
            Priority: Normal


I have attempted to launch Mesos agents inside of a docker container in such a 
way where the agent docker can be replaced and recovered. Unfortunately I hit a 
major snag in the way the mesos docker launching works.

To test simple functionality a marathon app is setup that simply has the 
following command: {{date && python -m SimpleHTTPServer $PORT0}} 

That way the HTTP port can be accessed to assure things are being assigned 
correctly, and the date is printed out in the log.

When I attempt to start this marathon app, the mesos agent (inside a docker 
container) properly launches an executor which properly creates a second task 
that launches the python code. Here's the output from the executor logs (this 
looks correct):

{code}
I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
10.0.75.2
I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002
I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
=1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
MARATHON_APP_RESOURCE_MEM=128.0 -e 
MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e PORT0=31464 
-e PORTS=31464 -e PORT_10000=31464 -e PORT_HTTP=31464 -v 
/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
 --net host --entrypoint /bin/sh --name 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
--label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 -c 
date && p
ython -m SimpleHTTPServer $PORT0
I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
Wed Nov  1 20:34:06 UTC 2017
{code}


But, somehow there is a TASK_FAILED message sent to marathon.

Upon further investigation, the following snippet can be found in the agent 
logs (running in a docker container)

{code}
I1101 20:34:00.949129     9 slave.cpp:1736] Got assigned task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.950150     9 gc.cpp:93] Unscheduling 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
 from gc
I1101 20:34:00.950225     9 gc.cpp:93] Unscheduling 
'/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
 from gc
I1101 20:34:00.950472    12 slave.cpp:2003] Authorizing task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.951210    12 slave.cpp:2171] Launching task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.952265    12 paths.cpp:578] Trying to chown 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec7
5' to user 'root'
I1101 20:34:00.952733    12 slave.cpp:7256] Launching executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 with resources 
[{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"a
llocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}]
 in work directory 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac1100
02/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
I1101 20:34:00.953045    12 slave.cpp:2858] Launching container 
84f9ae30-9d4c-484a-860c-ca7845b7ec75 for executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.955057     6 docker.cpp:1136] Starting container 
'84f9ae30-9d4c-484a-860c-ca7845b7ec75' for task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' (and executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002') of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45
-0001
I1101 20:34:00.955263    12 slave.cpp:2400] Queued task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.955965     6 docker.cpp:1531] Running docker -H 
unix:///var/run/docker.sock inspect python:2
I1101 20:34:01.037293     8 docker.cpp:454] Docker pull python:2 completed
I1101 20:34:01.038180     8 docker.cpp:1080] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 1126 --memory 167772160 -e 
LIBPROCESS_IP=10.0.75.2 -e LIBPROCESS_PORT=0 -e 
MESOS_AGENT_ENDPOINT=10.0.75.2:5051 -e MESOS_CHECKPOINT=1 -e 
MESOS_CONTAINER_NAME=mesos
-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor -e 
MESOS_DIRECTORY=/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca784
5b7ec75 -e MESOS_EXECUTOR_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e 
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs -e 
MESOS_FRAMEWORK_ID=a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 -e 
MESOS_HTTP_COMMAND_EXECUTOR=0 -e 
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.4.0.
so -e MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.4.0.so -e 
MESOS_RECOVERY_TIMEOUT=15mins -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
MESOS_SLAVE_ID=d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0 -e 
MESOS_SLAVE_PID=slave(1)@10.0.75.2:5051 -e MESOS_SUBSCRIPTION_BACKOFF_MAX=2secs 
-v /va
r/run/docker.sock:/var/run/docker.sock:ro -v 
/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/var/run/meso
s/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:rw
 -v /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/fra
meworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
 --net host --name mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor 
--pid=host --cap-add=SYS_ADMIN --c
ap-add=SYS_PTRACE mesos-docker /usr/libexec/mesos/mesos-docker-executor 
--cgroups_enable_cfs=false 
--container=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 --docker=docker 
--docker_socket=/var/run/docker.sock --help=false 
--initialize_driver_logging=true --launcher_dir=/u
sr/libexec/mesos --logbufsecs=0 --logging_level=INFO 
--mapped_directory=/mnt/mesos/sandbox --quiet=false 
--sandbox_directory=/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf4
3-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75 
--stop_timeout=0ns
I1101 20:34:01.040096     8 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
I1101 20:34:01.138551     5 docker.cpp:1290] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor', interval: 1secs
I1101 20:34:02.138964    13 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
I1101 20:34:03.423805     5 slave.cpp:3935] Got registration for executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
I1101 20:34:03.424316     5 docker.cpp:1616] Ignoring updating container 
84f9ae30-9d4c-484a-860c-ca7845b7ec75 because resources passed to update are 
identical to existing resources
I1101 20:34:03.424396     5 slave.cpp:2605] Sending queued task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' to executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
I1101 20:34:04.052783    11 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
E1101 20:34:04.156435    12 slave.cpp:5292] Container 
'84f9ae30-9d4c-484a-860c-ca7845b7ec75' for executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 failed to start: Failed to run 
'docker -H unix:///var/run/dock
er.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75': exited with status 
1; stderr='Error: No such object: mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
'
I1101 20:34:04.156497    12 docker.cpp:2078] Container 
84f9ae30-9d4c-484a-860c-ca7845b7ec75 launch failed
I1101 20:34:04.156622     7 slave.cpp:5405] Executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 has terminated with unknown status
I1101 20:34:04.156958     7 slave.cpp:4399] Handling status update TASK_FAILED 
(UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from @0.0.0.0:0
E1101 20:34:04.157133     7 slave.cpp:4721] Failed to update resources for 
container 84f9ae30-9d4c-484a-860c-ca7845b7ec75 of executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' running task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 on status update for terminal t
ask, destroying container: Container not found
W1101 20:34:04.157173    11 composing.cpp:582] Attempted to destroy unknown 
container 84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:04.159068    12 status_update_manager.cpp:323] Received status 
update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:04.159276    12 status_update_manager.cpp:834] Checkpointing UPDATE 
for status update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for 
task testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:04.159399    12 slave.cpp:4880] Forwarding the update TASK_FAILED 
(UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 to master@10.0.75.2:5050
I1101 20:34:04.294680     8 status_update_manager.cpp:395] Received status 
update acknowledgement (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:04.294747     8 status_update_manager.cpp:834] Checkpointing ACK 
for status update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for 
task testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:04.294945     8 slave.cpp:5516] Cleaning up executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
I1101 20:34:04.295308     8 slave.cpp:5612] Cleaning up framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:04.295418     8 gc.cpp:59] Scheduling 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
 for gc
 6.99999658438519days in the future
I1101 20:34:04.295459     8 gc.cpp:59] Scheduling 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
 for gc 6.99999658329778days in the future
I1101 20:34:04.295481     8 gc.cpp:59] Scheduling 
'/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
 f
or gc 6.99999658267259days in the future
I1101 20:34:04.295516     8 gc.cpp:59] Scheduling 
'/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
 for gc 6.99999658223407days in the future
I1101 20:34:04.295537     8 gc.cpp:59] Scheduling 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
 for gc 6.99999658143407days in the future
I1101 20:34:04.295558     8 gc.cpp:59] Scheduling 
'/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
 for gc 6.9999965810637days in the future
I1101 20:34:04.295581     8 status_update_manager.cpp:285] Closing status 
update streams for framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:06.748510     5 slave.cpp:4399] Handling status update TASK_RUNNING 
(UUID: 001f20d1-fe18-4cb0-9b39-cb9cd3cb9741) for task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
W1101 20:34:06.748546     5 slave.cpp:4455] Ignoring status update TASK_RUNNING 
(UUID: 001f20d1-fe18-4cb0-9b39-cb9cd3cb9741) for task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 for unknown framework a5eb6da1-f8ac-
4642-8d66-cdd2e5b14d45-0001
I1101 20:34:45.960857     9 slave.cpp:5828] Framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 seems to have exited. Ignoring 
registration timeout for executor 'testapp.f53fc2ce-bf43-11e7-a24b-0242ac110002'
I1101 20:34:52.039710    12 slave.cpp:5920] Current disk usage 0.50%. Max 
allowed age: 6.265284309533982days
I1101 20:35:00.955883    12 slave.cpp:5828] Framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 seems to have exited. Ignoring 
registration timeout for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
I1101 20:35:52.040870    11 slave.cpp:5920] Current disk usage 0.50%. Max 
allowed age: 6.265284309533982days
I1101 20:36:52.041280     5 slave.cpp:5920] Current disk usage 0.50%. Max 
allowed age: 6.265284309533982days
I1101 20:37:52.042034    11 slave.cpp:5920] Current disk usage 0.50%. Max 
allowed age: 6.265284309533982days
{code}


Of particular note is the following line:

{code}
I1101 20:34:01.138551     5 docker.cpp:1290] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor', interval: 1secs
I1101 20:34:02.138964    13 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
I1101 20:34:03.423805     5 slave.cpp:3935] Got registration for executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
I1101 20:34:03.424316     5 docker.cpp:1616] Ignoring updating container 
84f9ae30-9d4c-484a-860c-ca7845b7ec75 because resources passed to update are 
identical to existing resources
I1101 20:34:03.424396     5 slave.cpp:2605] Sending queued task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' to executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
I1101 20:34:04.052783    11 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
E1101 20:34:04.156435    12 slave.cpp:5292] Container 
'84f9ae30-9d4c-484a-860c-ca7845b7ec75' for executor 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 failed to start: Failed to run 
'docker -H unix:///var/run/dock
er.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75': exited with status 
1; stderr='Error: No such object: mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
'
{code}

For some reason it seems like the AGENT is trying to connect to the TASK docker 
container in addition to the executor container, and is failing to find it and 
instantly dying instead of obeying a retry mechanism. I do not have an 
environment setup to trace the code path of this error state, but the expected 
behavior is either that the Agent leaves it to the executor to track the task, 
or that the agent respects retries. Neither of which seem to be happening here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to