[ 
https://issues.apache.org/jira/browse/SINGA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192203#comment-15192203
 ] 

wangwei commented on SINGA-132:
-------------------------------

SINGA-126 will resolve the first case.
For the second case, if these workers are launched in different processes, then 
we need Zookeeper to coordinate them (e.g., for stopping). If all workers are 
in the same process, we do not need Zookeeper. Instead, we can use the Stub 
thread to monitor the num of alive workers and send msg to the servers once all 
workers have finished.

> Optimize training on a single node with GPUs
> --------------------------------------------
>
>                 Key: SINGA-132
>                 URL: https://issues.apache.org/jira/browse/SINGA-132
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>            Assignee: wangwei
>
> There are two training situations. 
> 1. a single worker. For this case, there is not need to launch a separate 
> server thread. Because it would lead to communication cost between the worker 
> and server. Instead, we can create an  Updater inside the Worker and call it 
> to update the parameters locally inside the Worker. The driver's working flow 
> should be changed for this case, i.e., there is no need to have a stub thread 
> and server thread. The worker should run in the main thread and the program 
> terminates once the worker finishes.
> 2. multiple worker. For this case, we need both workers and servers. First, 
> we can make zookeeper an optional dependent library, as it is used for Job ID 
> generation and termination condition check. If no Job ID is available, we can 
> always use the default Job ID (0). Since there is only one process, we don't 
> need zookeeper to know the status of workers in other processes. Second, the 
> communication between worker-stub-server should be optimized, e.g., using 
> GPU-Direct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to