[ 
https://issues.apache.org/jira/browse/SINGA-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597696#comment-14597696
 ] 

ASF subversion and git services commented on SINGA-19:
------------------------------------------------------

Commit e0a52a62577cc9845130b9d2c664007ec354804c in incubator-singa's branch 
refs/heads/master from wang wei
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=e0a52a6 ]

SINGA-19 Slice large Param objects for load-balance
Tested with single worker, two worker group and two worker groups
TODO test with multiple servers and server groups for distributed hogwild and 
allreduce.


> Slice large Param objects for load-balance
> ------------------------------------------
>
>                 Key: SINGA-19
>                 URL: https://issues.apache.org/jira/browse/SINGA-19
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>            Assignee: wangwei
>
> Some Param objects in deep learning models are much larger than other Param 
> objects. For example, a weight matrix is usually 100 times larger than a bias 
> vector. The difference in Param size causes two problems,
> 1. if there are multiple servers in one server group, then the servers may be 
> assigned different number of parameters to update.
> 2. if there are multiple server groups, e.g., in distributed Hogwild 
> framework, then these server groups may be assigned different number of 
> parameters to maintain.
> This ticket its to slice large Param objects to solve the load-balance 
> problem. The slicing operations are done in the stub thread to make them 
> transparent to both workers and servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to