[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails
[ https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165967#comment-16165967 ] Sebastian Gerlach commented on MESOS-7587: -- so, we finally found the issue. We run mess on an CentOS7 with default values for the "--docker_store_dir". CentOS cleans the "/tmp/ directory based on time (10 days). So after 10 days parts from our images disappear. [~arojas] So I think this issue can be closed. It was a problem between Keyboard and chair. > Launching tasks with the Mesos Containerizer after a long time without > launching new tasks fails > > > Key: MESOS-7587 > URL: https://issues.apache.org/jira/browse/MESOS-7587 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.2.0 >Reporter: Alexander Rojas >Priority: Critical > Labels: mesos-containerizer, mesosphere > > After having a cluster running without launching new tasks for an extended > period of time, ~1week. When launching a new task using the Mesos > Containerizer, the task fails to launch with the error: > [{{Failed to execute command: No such file or > directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778] > The task is launched from Marathon with the app definition: > {code} > { > "container": { > "type": "MESOS", > "docker": { > "forcePullImage": true, > "image": "private.repository.local/updated:fixed", > "privileged": false > } > }, > "cpus": 0.1, > "id": "/20150530/mesos9", > "instances": 1, > "minimumHealthCapacity": 1, > "acceptedResourceRoles": ["*"], > "constraints": [["hostname", "UNIQUE"]], > "mem": 128 > } > {code} > and {{Dockerfile}} > {code} > FROM private.repository.local/centos:stable > MAINTAINER Sebastian Gerlach "s...@boreus.de" > CMD python -m SimpleHTTPServer 80 > {code} > The obtained stdout is: > {noformat} > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' > Received SUBSCRIBED event > Subscribed executor on bp-mesos8.private.local > Received LAUNCH event > Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0 > /usr/libexec/mesos/mesos-containerizer launch --help="false" > --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m > SimpleHTTPServer >
[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails
[ https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162996#comment-16162996 ] Sebastian Gerlach commented on MESOS-7587: -- We have done some more research / debugging on this issue. We could isolate the problem a little bit further. If we switch from overlay backend to the copy backend we are able to start processes with the Mesos Containerizer without any problem. With strace we try to analyze why we get the message “No such file or directory”. It looks like the path for “python” wasn’t correct. Even if we use the full path it didn’t show up and we couldn’t start the container. We didn’t follow this path to deeply. So treat the strace results with care. So we think there is a problem with overlayfs. If we remove the fetcher tmp folder we could start the container again. But if we restore the fetcher folder we could reproduce the behavior. We are not sure when and why the cache or the overlay breaks. > Launching tasks with the Mesos Containerizer after a long time without > launching new tasks fails > > > Key: MESOS-7587 > URL: https://issues.apache.org/jira/browse/MESOS-7587 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.2.0 >Reporter: Alexander Rojas >Priority: Critical > Labels: mesos-containerizer, mesosphere > > After having a cluster running without launching new tasks for an extended > period of time, ~1week. When launching a new task using the Mesos > Containerizer, the task fails to launch with the error: > [{{Failed to execute command: No such file or > directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778] > The task is launched from Marathon with the app definition: > {code} > { > "container": { > "type": "MESOS", > "docker": { > "forcePullImage": true, > "image": "private.repository.local/updated:fixed", > "privileged": false > } > }, > "cpus": 0.1, > "id": "/20150530/mesos9", > "instances": 1, > "minimumHealthCapacity": 1, > "acceptedResourceRoles": ["*"], > "constraints": [["hostname", "UNIQUE"]], > "mem": 128 > } > {code} > and {{Dockerfile}} > {code} > FROM private.repository.local/centos:stable > MAINTAINER Sebastian Gerlach "s...@boreus.de" > CMD python -m SimpleHTTPServer 80 > {code} > The obtained stdout is: > {noformat} > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' > Received SUBSCRIBED event > Subscribed executor on bp-mesos8.private.local > Received LAUNCH event > Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0 > /usr/libexec/mesos/mesos-containerizer launch --help="false" > --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m > SimpleHTTPServer >
[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails
[ https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032600#comment-16032600 ] Sebastian Gerlach commented on MESOS-7587: -- if I can provide any more information or support you with tests to reproduce and or analyze the Problem, please let me know. > Launching tasks with the Mesos Containerizer after a long time without > launching new tasks fails > > > Key: MESOS-7587 > URL: https://issues.apache.org/jira/browse/MESOS-7587 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.2.0 >Reporter: Alexander Rojas >Priority: Critical > Labels: mesos-containerizer, mesosphere > > After having a cluster running without launching new tasks for an extended > period of time, ~1week. When launching a new task using the Mesos > Containerizer, the task fails to launch with the error: > [{{Failed to execute command: No such file or > directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778] > The task is launched from Marathon with the app definition: > {code} > { > "container": { > "type": "MESOS", > "docker": { > "forcePullImage": true, > "image": "private.repository.local/updated:fixed", > "privileged": false > } > }, > "cpus": 0.1, > "id": "/20150530/mesos9", > "instances": 1, > "minimumHealthCapacity": 1, > "acceptedResourceRoles": ["*"], > "constraints": [["hostname", "UNIQUE"]], > "mem": 128 > } > {code} > and {{Dockerfile}} > {code} > FROM private.repository.local/centos:stable > MAINTAINER Sebastian Gerlach "s...@boreus.de" > CMD python -m SimpleHTTPServer 80 > {code} > The obtained stdout is: > {noformat} > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' > Received SUBSCRIBED event > Subscribed executor on bp-mesos8.private.local > Received LAUNCH event > Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0 > /usr/libexec/mesos/mesos-containerizer launch --help="false" > --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m > SimpleHTTPServer >
[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails
[ https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031590#comment-16031590 ] Anand Mazumdar commented on MESOS-7587: --- [~jieyu] [~gilbert] Can you folks help take a look? > Launching tasks with the Mesos Containerizer after a long time without > launching new tasks fails > > > Key: MESOS-7587 > URL: https://issues.apache.org/jira/browse/MESOS-7587 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.2.0 >Reporter: Alexander Rojas >Priority: Critical > Labels: mesos-containerizer, mesosphere > > After having a cluster running without launching new tasks for an extended > period of time, ~1week. When launching a new task using the Mesos > Containerizer, the task fails to launch with the error: > [{{Failed to execute command: No such file or > directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778] > The task is launched from Marathon with the app definition: > {code} > { > "container": { > "type": "MESOS", > "docker": { > "forcePullImage": true, > "image": "private.repository.local/updated:fixed", > "privileged": false > } > }, > "cpus": 0.1, > "id": "/20150530/mesos9", > "instances": 1, > "minimumHealthCapacity": 1, > "acceptedResourceRoles": ["*"], > "constraints": [["hostname", "UNIQUE"]], > "mem": 128 > } > {code} > and {{Dockerfile}} > {code} > FROM private.repository.local/centos:stable > MAINTAINER Sebastian Gerlach "s...@boreus.de" > CMD python -m SimpleHTTPServer 80 > {code} > The obtained stdout is: > {noformat} > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' > Received SUBSCRIBED event > Subscribed executor on bp-mesos8.private.local > Received LAUNCH event > Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0 > /usr/libexec/mesos/mesos-containerizer launch --help="false" > --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m > SimpleHTTPServer >
[jira] [Commented] (MESOS-7587) Launching tasks with the Mesos Containerizer after a long time without launching new tasks fails
[ https://issues.apache.org/jira/browse/MESOS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030827#comment-16030827 ] Sebastian Gerlach commented on MESOS-7587: -- Environment: - CentOS Linux release 7.3.1611 (Core) - mesos.x86_641.2.0-2.0.6 - mesosphere-el-repo.noarch 7-3 - marathon 1.4.3 - calico v1.1.3 - 3 master - 6 slaves slave configuration: {code} # attributes hostname:bp-mesos7;manufacturer:VMware, Inc. # containerizers mesos,docker # image_providers docker # isolation docker/runtime,filesystem/linux,cgroups/cpu,cgroups/mem,cgroups/devices,cgroups/net_cls,disk/du # network_cni_config_dir /etc/calico/mesos # network_cni_plugins_dir /usr/share/calico # resources file:///etc/mesos-resources/resources.json # work_dir /var/lib/mesos {code} master configuration: {code} # ip 10.XXX.XXX.XXX # quorum 2 # work_dir /var/lib/mesos {code} > Launching tasks with the Mesos Containerizer after a long time without > launching new tasks fails > > > Key: MESOS-7587 > URL: https://issues.apache.org/jira/browse/MESOS-7587 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.2.0 >Reporter: Alexander Rojas > Labels: mesos-containerizer, mesosphere > > After having a cluster running without launching new tasks for an extended > period of time, ~1week. When launching a new task using the Mesos > Containerizer, the task fails to launch with the error: > [{{Failed to execute command: No such file or > directory}}|https://github.com/apache/mesos/blob/8245981b889ec3725cc0be4150b15d1fe9d64b86/src/slave/containerizer/mesos/launch.cpp#L778] > The task is launched from Marathon with the app definition: > {code} > { > "container": { > "type": "MESOS", > "docker": { > "forcePullImage": true, > "image": "private.repository.local/updated:fixed", > "privileged": false > } > }, > "cpus": 0.1, > "id": "/20150530/mesos9", > "instances": 1, > "minimumHealthCapacity": 1, > "acceptedResourceRoles": ["*"], > "constraints": [["hostname", "UNIQUE"]], > "mem": 128 > } > {code} > and {{Dockerfile}} > {code} > FROM private.repository.local/centos:stable > MAINTAINER Sebastian Gerlach "s...@boreus.de" > CMD python -m SimpleHTTPServer 80 > {code} > The obtained stdout is: > {noformat} > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/var\/lib\/mesos\/slaves\/562f1892-41c6-4c55-955e-662c63ab3ddf-S8\/frameworks\/562f1892-41c6-4c55-955e-662c63ab3ddf-\/executors\/20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0\/runs\/da1cb066-1354-42a6-82bc-eee688d61b94","\/var\/lib\/mesos\/provisioner\/containers\/da1cb066-1354-42a6-82bc-eee688d61b94\/backends\/overlay\/rootfses\/f74816b3-c115-4a5e-a51c-fbd502efd1fe\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' > Received SUBSCRIBED event > Subscribed executor on bp-mesos8.private.local > Received LAUNCH event > Starting task 20150530_mesos13.5f8c3db1-4517-11e7-8d95-024276be29e0 > /usr/libexec/mesos/mesos-containerizer launch --help="false" > --launch_info="{"command":{"arguments":["\/bin\/sh","-c","python -m > SimpleHTTPServer >