wangwei created SINGA-32:
----------------------------

             Summary: Implement AllReduce training framework
                 Key: SINGA-32
                 URL: https://issues.apache.org/jira/browse/SINGA-32
             Project: Singa
          Issue Type: New Feature
            Reporter: wangwei
            Assignee: wangwei


The AllReduce training framework runs in synchronous mode, where one worker 
starts the next iteration after all workers have finished the previous 
iteration. Baidu's deepimage system uses this training framework.

To implement it in SINGA, we launch one worker group and one server group. The 
model is partitioned (e.g., on dimension 0) among all workers. Params are 
sliced and partitioned among all servers. 

At the beginning, each Param (slice) is put into server shard including number 
of workers computing gradient for it.

For each iteration, the local stub aggregates all gradients for the same Param 
and sends to corresponding server including the number of local workers 
computing gradient for it. The server will buffer update requests and conducts 
update for a Param slice until it receives gradients from all workers. It sends 
back the updated Param (slices) to the corresponding process (stub).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to