[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates

2016-05-23 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297752#comment-15297752
 ] 

haosdent commented on MESOS-4565:
-

[~giaosuddau] Do you encounter the 
{code}
E0130 02:22:21.009094 12686 containerizer.cpp:553] Failed to clean up an 
isolator when destroying orphan container kube-proxy: Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos/1d965a20-849c-40d8-9446-27cb723220a9/kube-proxy': 
Device or resource busy
{code}

A quick workaround it unmount it manually and make Agent recover successfully. 

> slave recovers and attempt to destroy executor's child containers, then 
> begins rejecting task status updates
> 
>
> Key: MESOS-4565
> URL: https://issues.apache.org/jira/browse/MESOS-4565
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
>Reporter: James DeFelice
>  Labels: mesosphere
>
> AFAICT the slave is doing this:
> 1) recovering from some kind of failure
> 2) checking the containers that it pulled from its state store
> 3) complaining about cgroup children hanging off of executor containers
> 4) rejecting task status updates related to the executor container, the first 
> of which in the logs is:
> {code}
> E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for 
> container 1d965a20-849c-40d8-9446-27cb723220a9 of executor 
> 'd701ab48a0c0f13_k8sm-executor' running task 
> pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, 
> destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not 
> found
> {code}
> To be fair, I don't believe that my custom executor is re-registering 
> properly with the slave prior to attempting to send these (failing) status 
> updates. But the slave doesn't complain about that .. it complains that it 
> can't find the **container**.
> slave log here:
> https://gist.github.com/jdef/265663461156b7a7ed4e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates

2016-05-23 Thread Chanh Le (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297666#comment-15297666
 ] 

Chanh Le commented on MESOS-4565:
-

Any update on that?
I still get the issues.

> slave recovers and attempt to destroy executor's child containers, then 
> begins rejecting task status updates
> 
>
> Key: MESOS-4565
> URL: https://issues.apache.org/jira/browse/MESOS-4565
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
>Reporter: James DeFelice
>  Labels: mesosphere
>
> AFAICT the slave is doing this:
> 1) recovering from some kind of failure
> 2) checking the containers that it pulled from its state store
> 3) complaining about cgroup children hanging off of executor containers
> 4) rejecting task status updates related to the executor container, the first 
> of which in the logs is:
> {code}
> E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for 
> container 1d965a20-849c-40d8-9446-27cb723220a9 of executor 
> 'd701ab48a0c0f13_k8sm-executor' running task 
> pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, 
> destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not 
> found
> {code}
> To be fair, I don't believe that my custom executor is re-registering 
> properly with the slave prior to attempting to send these (failing) status 
> updates. But the slave doesn't complain about that .. it complains that it 
> can't find the **container**.
> slave log here:
> https://gist.github.com/jdef/265663461156b7a7ed4e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates

2016-02-03 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131778#comment-15131778
 ] 

James DeFelice commented on MESOS-4565:
---

To be clear the custom executor in this case is using the native
containerizer, not the docker one.



> slave recovers and attempt to destroy executor's child containers, then 
> begins rejecting task status updates
> 
>
> Key: MESOS-4565
> URL: https://issues.apache.org/jira/browse/MESOS-4565
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
>Reporter: James DeFelice
>  Labels: mesosphere
>
> AFAICT the slave is doing this:
> 1) recovering from some kind of failure
> 2) checking the containers that it pulled from its state store
> 3) complaining about cgroup children hanging off of executor containers
> 4) rejecting task status updates related to the executor container, the first 
> of which in the logs is:
> {code}
> E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for 
> container 1d965a20-849c-40d8-9446-27cb723220a9 of executor 
> 'd701ab48a0c0f13_k8sm-executor' running task 
> pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, 
> destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not 
> found
> {code}
> To be fair, I don't believe that my custom executor is re-registering 
> properly with the slave prior to attempting to send these (failing) status 
> updates. But the slave doesn't complain about that .. it complains that it 
> can't find the **container**.
> slave log here:
> https://gist.github.com/jdef/265663461156b7a7ed4e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)