[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7386:
----------------------------------
    Priority: Critical  (was: Major)

> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-7386
>                 URL: https://issues.apache.org/jira/browse/MESOS-7386
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, docker, executor
>    Affects Versions: 0.28.2, 1.2.0
>         Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>            Reporter: Pranay Kanwar
>            Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000 at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.99999192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.99999192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.99999192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.99999192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000'
>  for gc 6.99999192731556days in the future
> slave-one_1  | I0413 12:45:18.697670  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000'
>  for gc 6.99999192698963days in the future
> slave-one_1  | I0413 12:45:18.697698  8994 status_update_manager.cpp:285] 
> Closing status update streams for framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-0000
> {noformat}
> Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running
> {noformat}
> root@orobas:/# docker ps | grep 665e86c8-ef36-4be3-b56e-3ba7edc81182
> 8b4dd2ab340d        r4um/msg                             "/msg.sh"            
>     About an hour ago   Up About an hour                        
> mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.665e86c8-ef36-4be3-b56e-3ba7edc81182
> {noformat}
> If the mesos-logrorate / external logger process keeps dying these containers 
> keep piling up, for ex after killing twice more in above case
> {noformat}
> root@orobas:/# docker ps
> CONTAINER ID        IMAGE                                COMMAND              
>     CREATED             STATUS              PORTS               NAMES
> df340235c4a8        r4um/msg                             "/msg.sh"            
>     About an hour ago   Up About an hour                        
> mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.00dc4166-5824-4722-949d-104001e7dc17
> 7adfd654fc34        r4um/msg                             "/msg.sh"            
>     About an hour ago   Up About an hour                        
> mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.4835dab7-f7f5-4167-b0f9-a08cb0e7c688
> 8b4dd2ab340d        r4um/msg                             "/msg.sh"            
>     About an hour ago   Up About an hour                        
> mesos-d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0.665e86c8-ef36-4be3-b56e-3ba7edc81182
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to