[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails

2017-09-14 Thread Sebastian Gerlach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165967#comment-16165967
 ] 

Sebastian Gerlach commented on MESOS-7587:
--

so, we finally found the issue. We run mess on an CentOS7 with default values 
for the "--docker_store_dir". CentOS cleans the "/tmp/ directory based on time 
(10 days). So after 10 days parts from our images disappear. 

[~arojas] So I think this issue can be closed. It was a problem between 
Keyboard and chair. 

> Launching tasks with the Mesos Containerizer after a long time without 
> launching new tasks fails
> 
>
> Key: MESOS-7587
> URL: https://issues.apache.org/jira/browse/MESOS-7587
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Alexander Rojas
>Priority: Critical
>  Labels: mesos-containerizer, mesosphere
>
> After having a cluster running without launching new tasks for an extended 
> period of time, ~1week. When launching a new task using the Mesos 
> Containerizer, the task fails to launch with the error:
> [{{Failed to execute command: No such file or 
> directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778]
> The task is launched from Marathon with the app definition:
> {code}
> {
> "container": {
> "type": "MESOS",
> "docker": {
> "forcePullImage": true,
> "image": "private.repository.local/updated:fixed",
> "privileged": false
> }
> },
> "cpus": 0.1,
> "id": "/20150530/mesos9",
> "instances": 1,
> "minimumHealthCapacity": 1,
> "acceptedResourceRoles": ["*"],
> "constraints": [["hostname", "UNIQUE"]],
> "mem": 128
> }
> {code}
> and {{Dockerfile}}
> {code}
> FROM private.repository.local/centos:stable
> MAINTAINER Sebastian Gerlach "s...@boreus.de"
> CMD python -m SimpleHTTPServer 80
> {code}
> The obtained stdout is:
> {noformat}
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Received SUBSCRIBED event
> Subscribed executor on bp-mesos8.private.local
> Received LAUNCH event
> Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0
> /usr/libexec/mesos/mesos-containerizer launch --help="false" 
> --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m 
> SimpleHTTPServer 
> 

[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails

2017-09-12 Thread Sebastian Gerlach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162996#comment-16162996
 ] 

Sebastian Gerlach commented on MESOS-7587:
--

We have done some more research / debugging on this issue. We could isolate the 
problem a little bit further. If we switch from overlay backend to the copy 
backend we are able to start processes with the Mesos Containerizer without any 
problem.

With strace we try to analyze why we get the message “No such file or 
directory”. It looks like the path for “python” wasn’t correct. Even if we use 
the full path it didn’t show up and we couldn’t start the container. We didn’t 
follow this path to deeply. So treat the strace results with care.

So we think there is a problem with overlayfs. If we remove the fetcher tmp 
folder we could start the container again. But if we restore the fetcher folder 
we could reproduce the behavior.  

We are not sure when and why the cache or the overlay breaks. 


> Launching tasks with the Mesos Containerizer after a long time without 
> launching new tasks fails
> 
>
> Key: MESOS-7587
> URL: https://issues.apache.org/jira/browse/MESOS-7587
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Alexander Rojas
>Priority: Critical
>  Labels: mesos-containerizer, mesosphere
>
> After having a cluster running without launching new tasks for an extended 
> period of time, ~1week. When launching a new task using the Mesos 
> Containerizer, the task fails to launch with the error:
> [{{Failed to execute command: No such file or 
> directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778]
> The task is launched from Marathon with the app definition:
> {code}
> {
> "container": {
> "type": "MESOS",
> "docker": {
> "forcePullImage": true,
> "image": "private.repository.local/updated:fixed",
> "privileged": false
> }
> },
> "cpus": 0.1,
> "id": "/20150530/mesos9",
> "instances": 1,
> "minimumHealthCapacity": 1,
> "acceptedResourceRoles": ["*"],
> "constraints": [["hostname", "UNIQUE"]],
> "mem": 128
> }
> {code}
> and {{Dockerfile}}
> {code}
> FROM private.repository.local/centos:stable
> MAINTAINER Sebastian Gerlach "s...@boreus.de"
> CMD python -m SimpleHTTPServer 80
> {code}
> The obtained stdout is:
> {noformat}
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Received SUBSCRIBED event
> Subscribed executor on bp-mesos8.private.local
> Received LAUNCH event
> Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0
> /usr/libexec/mesos/mesos-containerizer launch --help="false" 
> --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m 
> SimpleHTTPServer 
> 

[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails

2017-06-01 Thread Sebastian Gerlach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032600#comment-16032600
 ] 

Sebastian Gerlach commented on MESOS-7587:
--

if I can provide any more information or support you with tests to reproduce 
and or analyze the Problem, please let me know. 

> Launching tasks with the Mesos Containerizer after a long time without 
> launching new tasks fails
> 
>
> Key: MESOS-7587
> URL: https://issues.apache.org/jira/browse/MESOS-7587
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Alexander Rojas
>Priority: Critical
>  Labels: mesos-containerizer, mesosphere
>
> After having a cluster running without launching new tasks for an extended 
> period of time, ~1week. When launching a new task using the Mesos 
> Containerizer, the task fails to launch with the error:
> [{{Failed to execute command: No such file or 
> directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778]
> The task is launched from Marathon with the app definition:
> {code}
> {
> "container": {
> "type": "MESOS",
> "docker": {
> "forcePullImage": true,
> "image": "private.repository.local/updated:fixed",
> "privileged": false
> }
> },
> "cpus": 0.1,
> "id": "/20150530/mesos9",
> "instances": 1,
> "minimumHealthCapacity": 1,
> "acceptedResourceRoles": ["*"],
> "constraints": [["hostname", "UNIQUE"]],
> "mem": 128
> }
> {code}
> and {{Dockerfile}}
> {code}
> FROM private.repository.local/centos:stable
> MAINTAINER Sebastian Gerlach "s...@boreus.de"
> CMD python -m SimpleHTTPServer 80
> {code}
> The obtained stdout is:
> {noformat}
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Received SUBSCRIBED event
> Subscribed executor on bp-mesos8.private.local
> Received LAUNCH event
> Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0
> /usr/libexec/mesos/mesos-containerizer launch --help="false" 
> --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m 
> SimpleHTTPServer 
> 

[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails

2017-05-31 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031590#comment-16031590
 ] 

Anand Mazumdar commented on MESOS-7587:
---

[~jieyu] [~gilbert] Can you folks help take a look?

> Launching tasks with the Mesos Containerizer after a long time without 
> launching new tasks fails
> 
>
> Key: MESOS-7587
> URL: https://issues.apache.org/jira/browse/MESOS-7587
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Alexander Rojas
>Priority: Critical
>  Labels: mesos-containerizer, mesosphere
>
> After having a cluster running without launching new tasks for an extended 
> period of time, ~1week. When launching a new task using the Mesos 
> Containerizer, the task fails to launch with the error:
> [{{Failed to execute command: No such file or 
> directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778]
> The task is launched from Marathon with the app definition:
> {code}
> {
> "container": {
> "type": "MESOS",
> "docker": {
> "forcePullImage": true,
> "image": "private.repository.local/updated:fixed",
> "privileged": false
> }
> },
> "cpus": 0.1,
> "id": "/20150530/mesos9",
> "instances": 1,
> "minimumHealthCapacity": 1,
> "acceptedResourceRoles": ["*"],
> "constraints": [["hostname", "UNIQUE"]],
> "mem": 128
> }
> {code}
> and {{Dockerfile}}
> {code}
> FROM private.repository.local/centos:stable
> MAINTAINER Sebastian Gerlach "s...@boreus.de"
> CMD python -m SimpleHTTPServer 80
> {code}
> The obtained stdout is:
> {noformat}
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Received SUBSCRIBED event
> Subscribed executor on bp-mesos8.private.local
> Received LAUNCH event
> Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0
> /usr/libexec/mesos/mesos-containerizer launch --help="false" 
> --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m 
> SimpleHTTPServer 
> 

[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails

2017-05-31 Thread Sebastian Gerlach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030827#comment-16030827
 ] 

Sebastian Gerlach commented on MESOS-7587:
--

Environment:
- CentOS Linux release 7.3.1611 (Core)
- mesos.x86_641.2.0-2.0.6 
- mesosphere-el-repo.noarch   7-3 
- marathon 1.4.3
- calico v1.1.3
- 3 master
- 6 slaves

slave configuration:
{code}
# attributes
hostname:bp-mesos7;manufacturer:VMware, Inc.
# containerizers
mesos,docker
# image_providers
docker
# isolation
docker/runtime,filesystem/linux,cgroups/cpu,cgroups/mem,cgroups/devices,cgroups/net_cls,disk/du
# network_cni_config_dir
/etc/calico/mesos
# network_cni_plugins_dir
/usr/share/calico
# resources
file:///etc/mesos-resources/resources.json
# work_dir
/var/lib/mesos
{code}

master configuration:
{code}
# ip
10.XXX.XXX.XXX
# quorum
2
# work_dir
/var/lib/mesos
{code}

> Launching tasks with the Mesos Containerizer after a long time without 
> launching new tasks fails
> 
>
> Key: MESOS-7587
> URL: https://issues.apache.org/jira/browse/MESOS-7587
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Alexander Rojas
>  Labels: mesos-containerizer, mesosphere
>
> After having a cluster running without launching new tasks for an extended 
> period of time, ~1week. When launching a new task using the Mesos 
> Containerizer, the task fails to launch with the error:
> [{{Failed to execute command: No such file or 
> directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778]
> The task is launched from Marathon with the app definition:
> {code}
> {
> "container": {
> "type": "MESOS",
> "docker": {
> "forcePullImage": true,
> "image": "private.repository.local/updated:fixed",
> "privileged": false
> }
> },
> "cpus": 0.1,
> "id": "/20150530/mesos9",
> "instances": 1,
> "minimumHealthCapacity": 1,
> "acceptedResourceRoles": ["*"],
> "constraints": [["hostname", "UNIQUE"]],
> "mem": 128
> }
> {code}
> and {{Dockerfile}}
> {code}
> FROM private.repository.local/centos:stable
> MAINTAINER Sebastian Gerlach "s...@boreus.de"
> CMD python -m SimpleHTTPServer 80
> {code}
> The obtained stdout is:
> {noformat}
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Received SUBSCRIBED event
> Subscribed executor on bp-mesos8.private.local
> Received LAUNCH event
> Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0
> /usr/libexec/mesos/mesos-containerizer launch --help="false" 
> --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m 
> SimpleHTTPServer 
>