[
https://issues.apache.org/jira/browse/SINGA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691196#comment-16691196
]
Ngin Yun Chuan edited comment on SINGA-406 at 11/19/18 2:39 AM:
----------------------------------------------------------------
The `nvidia/cuda:9.0-runtime-ubuntu16.04` seems to run workers correctly on my
mac machine without GPU, and in combination with setting `CUDA_VISIBLE_DEVICES`
dynamically during worker deployment, we can stay with 1 worker image that
works on both CPU-only machines and machines with GPU. Would there be any
problems with this setup?
If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it
mean that model developers need to extend from *both* worker Docker images to
support model training on both CPU and GPU, if they want to provide their
custom Docker image? Or should we drop this configurable option?
If we let app developers configure the Docker container at runtime, does it
mean that they will now have to know about the models that would be trained on
their dataset and understand the dependencies of each model (model developers
might need document)? If they are allowed to provide any Docker container, they
must extend Rafiki's worker image, build the image themselves, and submit to
DockerHub, and they must account for the dependencies of each model during
training. Feel like doing it this way makes it complex for the app developer?
was (Author: nginyc):
The ``nvidia/cuda:9.0-runtime-ubuntu16.04`` seems to run workers correctly on
my mac machine without GPU, and in combination with setting
``CUDA_VISIBLE_DEVICES`` dynamically during worker deployment, we can stay with
1 worker image that works on both CPU-only machines and machines with GPU.
Would there be any problems with this setup?
If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it
mean that model developers need to extend from *both* worker Docker images to
support model training on both CPU and GPU, if they want to provide their
custom Docker image? Or should we drop this configurable option?
If we let app developers configure the Docker container at runtime, does it
mean that they will now have to know about the models that would be trained on
their dataset and understand the dependencies of each model (model developers
might need document)? If they are allowed to provide any Docker container, they
must extend Rafiki's worker image, build the image themselves, and submit to
DockerHub, and they must account for the dependencies of each model during
training. Feel like doing it this way makes it complex for the app developer?
> [Rafiki] Add POS tagging task & add GPU support (0.0.7)
> -------------------------------------------------------
>
> Key: SINGA-406
> URL: https://issues.apache.org/jira/browse/SINGA-406
> Project: Singa
> Issue Type: New Feature
> Reporter: Ngin Yun Chuan
> Priority: Major
>
> Refer to https://github.com/nginyc/rafiki/pull/71 for details
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)