[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates
[ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297752#comment-15297752 ] haosdent commented on MESOS-4565: - [~giaosuddau] Do you encounter the {code} E0130 02:22:21.009094 12686 containerizer.cpp:553] Failed to clean up an isolator when destroying orphan container kube-proxy: Failed to remove cgroup '/sys/fs/cgroup/memory/mesos/1d965a20-849c-40d8-9446-27cb723220a9/kube-proxy': Device or resource busy {code} A quick workaround it unmount it manually and make Agent recover successfully. > slave recovers and attempt to destroy executor's child containers, then > begins rejecting task status updates > > > Key: MESOS-4565 > URL: https://issues.apache.org/jira/browse/MESOS-4565 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 >Reporter: James DeFelice > Labels: mesosphere > > AFAICT the slave is doing this: > 1) recovering from some kind of failure > 2) checking the containers that it pulled from its state store > 3) complaining about cgroup children hanging off of executor containers > 4) rejecting task status updates related to the executor container, the first > of which in the logs is: > {code} > E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for > container 1d965a20-849c-40d8-9446-27cb723220a9 of executor > 'd701ab48a0c0f13_k8sm-executor' running task > pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, > destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not > found > {code} > To be fair, I don't believe that my custom executor is re-registering > properly with the slave prior to attempting to send these (failing) status > updates. But the slave doesn't complain about that .. it complains that it > can't find the **container**. > slave log here: > https://gist.github.com/jdef/265663461156b7a7ed4e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates
[ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297666#comment-15297666 ] Chanh Le commented on MESOS-4565: - Any update on that? I still get the issues. > slave recovers and attempt to destroy executor's child containers, then > begins rejecting task status updates > > > Key: MESOS-4565 > URL: https://issues.apache.org/jira/browse/MESOS-4565 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 >Reporter: James DeFelice > Labels: mesosphere > > AFAICT the slave is doing this: > 1) recovering from some kind of failure > 2) checking the containers that it pulled from its state store > 3) complaining about cgroup children hanging off of executor containers > 4) rejecting task status updates related to the executor container, the first > of which in the logs is: > {code} > E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for > container 1d965a20-849c-40d8-9446-27cb723220a9 of executor > 'd701ab48a0c0f13_k8sm-executor' running task > pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, > destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not > found > {code} > To be fair, I don't believe that my custom executor is re-registering > properly with the slave prior to attempting to send these (failing) status > updates. But the slave doesn't complain about that .. it complains that it > can't find the **container**. > slave log here: > https://gist.github.com/jdef/265663461156b7a7ed4e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates
[ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131778#comment-15131778 ] James DeFelice commented on MESOS-4565: --- To be clear the custom executor in this case is using the native containerizer, not the docker one. > slave recovers and attempt to destroy executor's child containers, then > begins rejecting task status updates > > > Key: MESOS-4565 > URL: https://issues.apache.org/jira/browse/MESOS-4565 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 >Reporter: James DeFelice > Labels: mesosphere > > AFAICT the slave is doing this: > 1) recovering from some kind of failure > 2) checking the containers that it pulled from its state store > 3) complaining about cgroup children hanging off of executor containers > 4) rejecting task status updates related to the executor container, the first > of which in the logs is: > {code} > E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for > container 1d965a20-849c-40d8-9446-27cb723220a9 of executor > 'd701ab48a0c0f13_k8sm-executor' running task > pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, > destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not > found > {code} > To be fair, I don't believe that my custom executor is re-registering > properly with the slave prior to attempting to send these (failing) status > updates. But the slave doesn't complain about that .. it complains that it > can't find the **container**. > slave log here: > https://gist.github.com/jdef/265663461156b7a7ed4e -- This message was sent by Atlassian JIRA (v6.3.4#6332)