[ 
https://issues.apache.org/jira/browse/SINGA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389479#comment-15389479
 ] 

ASF subversion and git services commented on SINGA-226:
-------------------------------------------------------

Commit d45715da07a65e38e5e8f437461c37da4092a9c3 in incubator-singa's branch 
refs/heads/dev from WANG Ji
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=d45715d ]

SINGA-226 Add parallel training on a single machine for singa v1.0

This commit implements a updater class for parallel training
in local-cpu and local-gpu mode. (The mode specification described
in https://issues.apache.org/jira/browse/SINGA-226)

Updater class is a wrapper of Optimizer class. It controls the
communication pattern among workers. When initializing Updater,
the user needs to provide the total number of workers and where
the Updater does aggregation and computation.

File changed descibed as follows:
* Put a new folder named cifar-parallel under example, containing
  the single machine multi-gpu parallel training example.
* Replace Optimizer pointer in feed_forward_net class with Updater
  pointer, since Updater is a wrapper of Optimizer. So the compile()
  method is changed accordingly.
* Add a helper function TrainThread() method in feed_forward_net to
  launch a training thread.
* Adapt alexnet.cc original cifar10 example to support the new compile()
* Fixed a bug in memory.cc which happens during initialzing cnmem
  memory pool


> Add parallel training on a single machine for singa v1.0
> --------------------------------------------------------
>
>                 Key: SINGA-226
>                 URL: https://issues.apache.org/jira/browse/SINGA-226
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: Wang Ji
>            Assignee: Wang Ji
>
> In this ticket, we implement parallel training using multiple devices on a 
> single machine. 
> To support parallel training, a Updater class need to be implemented to 
> aggregate partial gradient from parallel workers and using Optimizer to 
> update the Parameters. Updater can be designed for different kinds of 
> topological structure, i.e., *local-cpu*, *local-dev*, *local-allreduce*. 
> *local-cpu:* Do aggregate and update parameter using CPU. In this mode, host 
> CPU need to copy gradient and parameter tensor from GPU workers, do update, 
> and copy back.
> *local-gpu:* Do aggregate and update parameter using a chosen GPU. In this 
> mode, the updater GPU need to copy gradient and parameter tensor from other 
> GPU workers, do update, and copy back.
> *local-allreduce:* In this mode, each parameter will be sliced among all GPU 
> workers. In each iteration, gradients are aggregated and updated like a MPI 
> Allreduce style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to