[
https://issues.apache.org/jira/browse/SINGA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389481#comment-15389481
]
ASF subversion and git services commented on SINGA-226:
-------------------------------------------------------
Commit 0184fac30b9c4a62925d5b15138ed8658b5e1e38 in incubator-singa's branch
refs/heads/dev from WANG Ji
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=0184fac ]
SINGA-226 Add parallel training on a single machine for singa v1.0
Move cifar-10 parallel training from a separated folder into example/cifar10.
Retain former Compile() method in feed_forward_net to receive a Optimizer
argument, in this way
the previous single card version alexnet.cc can keep unchanged.
Add a updater folder in src/model folder.
> Add parallel training on a single machine for singa v1.0
> --------------------------------------------------------
>
> Key: SINGA-226
> URL: https://issues.apache.org/jira/browse/SINGA-226
> Project: Singa
> Issue Type: New Feature
> Reporter: Wang Ji
> Assignee: Wang Ji
>
> In this ticket, we implement parallel training using multiple devices on a
> single machine.
> To support parallel training, a Updater class need to be implemented to
> aggregate partial gradient from parallel workers and using Optimizer to
> update the Parameters. Updater can be designed for different kinds of
> topological structure, i.e., *local-cpu*, *local-dev*, *local-allreduce*.
> *local-cpu:* Do aggregate and update parameter using CPU. In this mode, host
> CPU need to copy gradient and parameter tensor from GPU workers, do update,
> and copy back.
> *local-gpu:* Do aggregate and update parameter using a chosen GPU. In this
> mode, the updater GPU need to copy gradient and parameter tensor from other
> GPU workers, do update, and copy back.
> *local-allreduce:* In this mode, each parameter will be sliced among all GPU
> workers. In each iteration, gradients are aggregated and updated like a MPI
> Allreduce style.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)