[ https://issues.apache.org/jira/browse/MESOS-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-6327: --------------------------------------- Summary: Large docker images causes container launch failures: Too many levels of symbolic links. (was: Large docker images causes container launch failures: Too many levels of symbolic links) > Large docker images causes container launch failures: Too many levels of > symbolic links. > ---------------------------------------------------------------------------------------- > > Key: MESOS-6327 > URL: https://issues.apache.org/jira/browse/MESOS-6327 > Project: Mesos > Issue Type: Bug > Components: containerization, docker > Affects Versions: 1.0.0, 1.0.1 > Environment: centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in > the Apache Aurora vagrant image > Reporter: Rogier Dikkes > Priority: Critical > > When deploying Mesos containers with large (6G+, 60+ layers) Docker images > the task crashes with the error: > Mesos agent logs: > E1007 08:40:12.954227 8117 slave.cpp:3976] Container > 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4' for executor > 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365' > of framework df > c91a86-84b9-4539-a7be-4ace7b7b44a1-0000 failed to start: Collect failed: > Collect failed: Failed to copy layer: cp: cannot stat > ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b > ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’: > Too many levels of symbolic links > ... (complete pastebin: http://pastebin.com/umZ4Q5d1 ) > How to replicate: > Start the aurora vagrant image. Adjust the > /etc/mesos-slave/executor_registration_timeout to 5 mins. Adjust the file > /vagrant/examples/jobs/hello_docker_image.aurora to start a large Docker > image instead of the example. (you can use anldisr/jupyter:0.4 i created as a > test image, this is based upon the jupyter notebook stacks.). Create the job, > watch it fail after x number of minutes. > The mesos sandbox is empty. > Aurora errors i see: > 28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect > failed: Failed to copy layer: cp: cannot stat > ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’: > Too many levels of symbolic links cp: cannot stat ... > Too many levels of symbolic links ; Container destroyed while provisioning > images > (complete pastebin: http://pastebin.com/uecHYD5J ) > To rule out the image i started this and more images as a normal Docker > container. This works without issues. > Mesos flags related configured: > -appc_store_dir > /tmp/mesos/images/appc > -containerizers > docker,mesos > -executor_registration_timeout > 5mins > -image_providers > appc,docker > -image_provisioner_backend > copy > -isolation > filesystem/linux,docker/runtime > Affected Mesos versions tested: 1.0.1 & 1.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)