[
https://issues.apache.org/jira/browse/SINGA-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597696#comment-14597696
]
ASF subversion and git services commented on SINGA-19:
------------------------------------------------------
Commit e0a52a62577cc9845130b9d2c664007ec354804c in incubator-singa's branch
refs/heads/master from wang wei
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=e0a52a6 ]
SINGA-19 Slice large Param objects for load-balance
Tested with single worker, two worker group and two worker groups
TODO test with multiple servers and server groups for distributed hogwild and
allreduce.
> Slice large Param objects for load-balance
> ------------------------------------------
>
> Key: SINGA-19
> URL: https://issues.apache.org/jira/browse/SINGA-19
> Project: Singa
> Issue Type: New Feature
> Reporter: wangwei
> Assignee: wangwei
>
> Some Param objects in deep learning models are much larger than other Param
> objects. For example, a weight matrix is usually 100 times larger than a bias
> vector. The difference in Param size causes two problems,
> 1. if there are multiple servers in one server group, then the servers may be
> assigned different number of parameters to update.
> 2. if there are multiple server groups, e.g., in distributed Hogwild
> framework, then these server groups may be assigned different number of
> parameters to maintain.
> This ticket its to slice large Param objects to solve the load-balance
> problem. The slicing operations are done in the stub thread to make them
> transparent to both workers and servers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)