wangwei created SINGA-32:
----------------------------
Summary: Implement AllReduce training framework
Key: SINGA-32
URL: https://issues.apache.org/jira/browse/SINGA-32
Project: Singa
Issue Type: New Feature
Reporter: wangwei
Assignee: wangwei
The AllReduce training framework runs in synchronous mode, where one worker
starts the next iteration after all workers have finished the previous
iteration. Baidu's deepimage system uses this training framework.
To implement it in SINGA, we launch one worker group and one server group. The
model is partitioned (e.g., on dimension 0) among all workers. Params are
sliced and partitioned among all servers.
At the beginning, each Param (slice) is put into server shard including number
of workers computing gradient for it.
For each iteration, the local stub aggregates all gradients for the same Param
and sends to corresponding server including the number of local workers
computing gradient for it. The server will buffer update requests and conducts
update for a Param slice until it receives gradients from all workers. It sends
back the updated Param (slices) to the corresponding process (stub).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)