[ 
https://issues.apache.org/jira/browse/SINGA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691196#comment-16691196
 ] 

Ngin Yun Chuan edited comment on SINGA-406 at 11/19/18 2:39 AM:
----------------------------------------------------------------

The `nvidia/cuda:9.0-runtime-ubuntu16.04` seems to run workers correctly on my 
mac machine without GPU, and in combination with setting `CUDA_VISIBLE_DEVICES` 
dynamically during worker deployment, we can stay with 1 worker image that 
works on both CPU-only machines and machines with GPU. Would there be any 
problems with this setup?

If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it 
mean that model developers need to extend from *both* worker Docker images to 
support model training on both CPU and GPU, if they want to provide their 
custom Docker image? Or should we drop this configurable option?

If we let app developers configure the Docker container at runtime, does it 
mean that they will now have to know about the models that would be trained on 
their dataset and understand the dependencies of each model (model developers 
might need document)? If they are allowed to provide any Docker container, they 
must extend Rafiki's worker image, build the image themselves, and submit to 
DockerHub, and they must account for the dependencies of each model during 
training. Feel like doing it this way makes it complex for the app developer?


was (Author: nginyc):
The ``nvidia/cuda:9.0-runtime-ubuntu16.04`` seems to run workers correctly on 
my mac machine without GPU, and in combination with setting 
``CUDA_VISIBLE_DEVICES`` dynamically during worker deployment, we can stay with 
1 worker image that works on both CPU-only machines and machines with GPU. 
Would there be any problems with this setup?

If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it 
mean that model developers need to extend from *both* worker Docker images to 
support model training on both CPU and GPU, if they want to provide their 
custom Docker image? Or should we drop this configurable option?

If we let app developers configure the Docker container at runtime, does it 
mean that they will now have to know about the models that would be trained on 
their dataset and understand the dependencies of each model (model developers 
might need document)? If they are allowed to provide any Docker container, they 
must extend Rafiki's worker image, build the image themselves, and submit to 
DockerHub, and they must account for the dependencies of each model during 
training. Feel like doing it this way makes it complex for the app developer?

> [Rafiki] Add POS tagging task & add GPU support (0.0.7)
> -------------------------------------------------------
>
>                 Key: SINGA-406
>                 URL: https://issues.apache.org/jira/browse/SINGA-406
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: Ngin Yun Chuan
>            Priority: Major
>
> Refer to https://github.com/nginyc/rafiki/pull/71 for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to