[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anand Mazumdar updated MESOS-7386: ---------------------------------- Priority: Critical (was: Major) > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > ------------------------------------------------------------------------------------------------------------- > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor > Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger > Reporter: Pranay Kanwar > Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.99999192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.99999192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.99999192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.99999192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000' > for gc 6.99999192731556days in the future > slave-one_1 | I0413 12:45:18.697670 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000' > for gc 6.99999192698963days in the future > slave-one_1 | I0413 12:45:18.697698 8994 status_update_manager.cpp:285] > Closing status update streams for framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 > {noformat} > Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running > {noformat} > root@orobas:/# docker ps | grep 665e86c8-ef36-4be3-b56e-3ba7edc81182 > 8b4dd2ab340d r4um/msg "/msg.sh" > About an hour ago Up About an hour > mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.665e86c8-ef36-4be3-b56e-3ba7edc81182 > {noformat} > If the mesos-logrorate / external logger process keeps dying these containers > keep piling up, for ex after killing twice more in above case > {noformat} > root@orobas:/# docker ps > CONTAINER ID IMAGE COMMAND > CREATED STATUS PORTS NAMES > df340235c4a8 r4um/msg "/msg.sh" > About an hour ago Up About an hour > mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.00dc4166-5824-4722-949d-104001e7dc17 > 7adfd654fc34 r4um/msg "/msg.sh" > About an hour ago Up About an hour > mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.4835dab7-f7f5-4167-b0f9-a08cb0e7c688 > 8b4dd2ab340d r4um/msg "/msg.sh" > About an hour ago Up About an hour > mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.665e86c8-ef36-4be3-b56e-3ba7edc81182 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)