[ https://issues.apache.org/jira/browse/SINGA-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800572#comment-16800572 ]
Ngin Yun Chuan commented on SINGA-435: -------------------------------------- Hi Liu Hui, Have you followed the instructions on https://nginyc.github.io/rafiki/docs/latest/docs/src/dev/setup.html#scaling-rafiki? Specifically, remember to do step 7 that adds a "GPU" tag to a node. Let me know if there's any other issues! Yun Chuan > Rafiki--Can't create a train job with 'ENABLE_GPU' > -------------------------------------------------- > > Key: SINGA-435 > URL: https://issues.apache.org/jira/browse/SINGA-435 > Project: Singa > Issue Type: Bug > Reporter: Liu Hui > Priority: Major > Attachments: rafiki_admin001.png > > > >>https://nginyc.github.io/rafiki/docs/latest/docs/src/user/quickstart.html > I followed the quickstart and tried to create a train job with using GPU。 > So I changed parameters to "budget=\{'ENABLE_GPU':1, 'MODEL_TRIAL_COUNT': 2 > }" .when I create a train job. > But the container of rafiki_worker didn't start. > I entered the container of rafiki_admin, and found an error in log file. > Finally I found that, in rafiki/rafiki/container/docker_swarm.py, the > function of _if_any_node_has_gpu always return False. > I doubt that what should I do to do a training with GPU in container. Which > steps have I missed, setting up docker's environment or others? -- This message was sent by Atlassian JIRA (v7.6.3#76005)