[jira] [Commented] (SINGA-226) Add parallel training on a single machine for singa v1.0

ASF subversion and git services (JIRA) Fri, 22 Jul 2016 06:33:38 -0700

    [ 
https://issues.apache.org/jira/browse/SINGA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389481#comment-15389481
 ]


ASF subversion and git services commented on SINGA-226:
-------------------------------------------------------

Commit 0184fac30b9c4a62925d5b15138ed8658b5e1e38 in incubator-singa's branch 
refs/heads/dev from WANG Ji
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=0184fac ]

SINGA-226 Add parallel training on a single machine for singa v1.0

Move cifar-10 parallel training from a separated folder into example/cifar10.
Retain former Compile() method in feed_forward_net to receive a Optimizer 
argument, in this way
the previous single card version alexnet.cc can keep unchanged.
Add a updater folder in src/model folder.


> Add parallel training on a single machine for singa v1.0
> --------------------------------------------------------
>
>                 Key: SINGA-226
>                 URL: https://issues.apache.org/jira/browse/SINGA-226
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: Wang Ji
>            Assignee: Wang Ji
>
> In this ticket, we implement parallel training using multiple devices on a 
> single machine. 
> To support parallel training, a Updater class need to be implemented to 
> aggregate partial gradient from parallel workers and using Optimizer to 
> update the Parameters. Updater can be designed for different kinds of 
> topological structure, i.e., *local-cpu*, *local-dev*, *local-allreduce*. 
> *local-cpu:* Do aggregate and update parameter using CPU. In this mode, host 
> CPU need to copy gradient and parameter tensor from GPU workers, do update, 
> and copy back.
> *local-gpu:* Do aggregate and update parameter using a chosen GPU. In this 
> mode, the updater GPU need to copy gradient and parameter tensor from other 
> GPU workers, do update, and copy back.
> *local-allreduce:* In this mode, each parameter will be sliced among all GPU 
> workers. In each iteration, gradients are aggregated and updated like a MPI 
> Allreduce style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SINGA-226) Add parallel training on a single machine for singa v1.0

Reply via email to