barry lee created MESOS-10227:
---------------------------------
Summary: After mesos-agent starts, mesos-exeute fails to be
executed using the GPU
Key: MESOS-10227
URL: https://issues.apache.org/jira/browse/MESOS-10227
Project: Mesos
Issue Type: Task
Components: agent
Affects Versions: 1.11.0
Environment: mesos-agent \
--master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos
\
--log_dir=/var/log/mesos --containerizers=docker,mesos \
--executor_registration_timeout=5mins \
--hostname=192.168.10.19 \
--ip=192.168.10.19 \
--port=5051 \
--work_dir=/var/lib/mesos \
--image_providers=docker \
—executor_environment_variables="{}" \
--isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia"
mesos-execute \
--master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos
\
--name=gpu-test \
--docker_image=nvidia/cuda \
--command="nvidia-smi" \
--framework_capabilities="GPU_RESOURCES" \
--resources="gpus:1"
Reporter: barry lee
Fix For: 1.11.0
I0819 18:14:26.088129 9337 containerizer.cpp:3414] Transitioning the state of
container fab468e6-bcbd-499c-9c24-ccd572c8317b from PROVISIONING to DESTROYING
after 2.207289088secs
I0819 18:14:26.089609 9339 slave.cpp:7100] Executor 'gpu-test' of framework
d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 has terminated with unknown status
I0819 18:14:26.091435 9339 slave.cpp:5981] Handling status update TASK_FAILED
(Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of
framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 from @0.0.0.0:0
E0819 18:14:26.092530 9346 slave.cpp:6357] Failed to update resources for
container fab468e6-bcbd-499c-9c24-ccd572c8317b of executor 'gpu-test' running
task gpu-test on status update for terminal task, destroying container:
Container not found
W0819 18:14:26.092737 9341 composing.cpp:614] Attempted to destroy unknown
container fab468e6-bcbd-499c-9c24-ccd572c8317b
I0819 18:14:26.092895 9331 task_status_update_manager.cpp:328] Received task
status update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17)
for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.093626 9333 slave.cpp:6527] Forwarding the update TASK_FAILED
(Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of
framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 to
[email protected]:5050
I0819 18:14:26.102195 9342 slave.cpp:4310] Shutting down framework
d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102257 9342 slave.cpp:7218] Cleaning up executor 'gpu-test' of
framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102448 9332 gc.cpp:95] Scheduling
'/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test/runs/fab468e6-bcbd-499c-9c24-ccd572c8317b'
for gc 6.9999988156days in the future
I0819 18:14:26.102600 9332 gc.cpp:95] Scheduling
'/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test'
for gc 6.99999881303111days in the future
I0819 18:14:26.102725 9342 slave.cpp:7347] Cleaning up framework
d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102805 9335 task_status_update_manager.cpp:289] Closing task
status update streams for framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102901 9342 gc.cpp:95] Scheduling
'/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027'
for gc 6.99999881020741days in the future
I0819 18:14:34.385221 9334 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._67
from 192.168.110.142:11640 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:14:45.385519 9344 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6a
from 192.168.110.142:11690 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:14:56.381196 9334 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6d
from 192.168.110.142:11716 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:15:07.385897 9340 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6g
from 192.168.110.142:11745 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:15:18.397059 9343 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6j
from 192.168.110.142:11774 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:15:20.797320 9331 slave.cpp:7657] Current disk usage 3.77%. Max
allowed age: 6.036056697613576days
I0819 18:15:29.377502 9341 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6m
from 192.168.110.142:13466 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:15:40.386363 9335 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6p
from 192.168.110.142:13490 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:15:51.388419 9341 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6s
from 192.168.110.142:13515 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:16:02.377324 9336 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6v
from 192.168.110.142:13543 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:16:13.391608 9346 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6y
from 192.168.110.142:13571 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:16:20.798060 9340 slave.cpp:7657] Current disk usage 3.77%. Max
allowed age: 6.036056697613576days
I0819 18:16:24.390466 9345 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._71
from 192.168.110.142:13593 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:16:35.390462 9337 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._74
from 192.168.110.142:13612 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
I0819 18:16:46.374727 9345 http.cpp:1436] HTTP GET for
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._77
from 192.168.110.142:13631 with User-Agent='Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159
Safari/537.36'
--
This message was sent by Atlassian Jira
(v8.3.4#803005)