[ https://issues.apache.org/jira/browse/MESOS-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756585#comment-15756585 ]
Yu Yang commented on MESOS-6810: -------------------------------- this is the output of {{curl -V}} {quote} curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets {quote} I should clarify that this problem also reproduced in a mesos cluster with nearly 30 machines, the error mesage is: {{Failed to launch container: Collect failed: Failed to perform 'curl': curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104; Container destroyed while provisioning images}} > Tasks getting stuck in STAGING state when using unified containerizer > --------------------------------------------------------------------- > > Key: MESOS-6810 > URL: https://issues.apache.org/jira/browse/MESOS-6810 > Project: Mesos > Issue Type: Bug > Components: containerization, docker > Affects Versions: 1.0.0, 1.0.1, 1.1.0 > Environment: *OS*: ubuntu16.04 64bit > *mesos*: 1.1.0, one master and one agent on same machine > *Agent flag*: {{sudo ./bin/mesos-agent.sh --master=192.168.1.192:5050 > --work_dir=/tmp/mesos_slave --image_providers=docker > --isolation=docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia > --containerizers=mesos,docker --executor_environment_variables="{}"}} > Reporter: Yu Yang > > when submit tasks using container settings like: > {code} > { > "container": { > "mesos": { > "image": { > "docker": { > "name": "nvidia/cuda" > }, > "type": "DOCKER" > } > }, > "type": "MESOS" > }, > } > {code} > then task will get stuck in STAGING state, and finally it will fail with > message {{Failed to launch container: Collect failed: Failed to perform > 'curl': curl: (56) GnuTLS recv error (-54): Error in pull function}} > this is the related log on > agent > {code} > I1217 13:05:35.406365 20780 slave.cpp:1539] Got assigned task > 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for framework > 02083c57-b2d9-4054-babe-90e962816813-0001 > I1217 13:05:35.406749 20780 slave.cpp:1701] Launching task > 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for framework > 02083c57-b2d9-4054-babe-90e962816813-0001 > I1217 13:05:35.406970 20780 paths.cpp:536] Trying to chown > '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c' > to user 'root' > I1217 13:05:35.409272 20780 slave.cpp:6179] Launching executor > 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework > 02083c57-b2d9-4054-babe-90e962816813-0001 with resources cpus(*):0.1; > mem(*):32 in work directory > '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c' > I1217 13:05:35.409958 20780 slave.cpp:1987] Queued task > 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for executor > 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework > 02083c57-b2d9-4054-babe-90e962816813-0001 > I1217 13:05:35.410163 20779 docker.cpp:1000] Skipping non-docker container > I1217 13:05:35.410636 20776 containerizer.cpp:938] Starting container > 8be3b5cd-afa3-4189-aa2a-f09d73529f8c for executor > 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework > 02083c57-b2d9-4054-babe-90e962816813-0001 > I1217 13:05:44.459362 20778 slave.cpp:4992] Terminating executor > ''cuda_mesos_nvidia_tf.72e9b9cf-8220-49bd-86fe-1667ee5e7a02' of framework > 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within > 1mins > I1217 13:05:53.586819 20780 slave.cpp:5044] Current disk usage 63.59%. Max > allowed age: 1.848503351525151days > I1217 13:06:35.410905 20777 slave.cpp:4992] Terminating executor > ''mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework > 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within > 1mins > I1217 13:06:35.411175 20780 containerizer.cpp:1950] Destroying container > 8be3b5cd-afa3-4189-aa2a-f09d73529f8c in PROVISIONING state > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)