[jira] [Commented] (SYSTEMML-1563) Add a distributed synchronous SGD MNIST LeNet example

2017-04-26 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985694#comment-15985694
 ] 

Mike Dusenberry commented on SYSTEMML-1563:
---

[PR 442 | https://github.com/apache/incubator-systemml/pull/442] submitted.

> Add a distributed synchronous SGD MNIST LeNet example
> -
>
> Key: SYSTEMML-1563
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1563
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This aims to add a distributed synchronous SGD MNIST LeNet example.  In 
> distributed synchronous SGD, multiple mini-batches are run forward & backward 
> simultaneously, and the gradients are aggregated together by addition before 
> the model parameters are updated.  This is mathematically equivalent to 
> simply using a large mini-batch size, i.e. {{new_mini_batch_size = 
> mini_batch_size * number_of_parallel_mini_batches}}.  The benefit is that 
> distributed synchronous SGD can make use of multiple devices, i.e. multiple 
> GPUs or multiple CPU machines, and thus can speed up training time.  More 
> specifically, using an effectively larger mini-batch size can yield a more 
> stable gradient in expectation, and a larger number of epochs can be run in 
> the same amount of time, both of which lead to faster convergence.  
> Alternatives include various forms of distributed *asynchronous* SGD, such as 
> Downpour, Hogwild, etc.  However, a recent paper \[1] from Google Brain / 
> Open AI has found evidence supporting the claim that distributed synchronous 
> SGD can lead to faster convergence, particularly if it is extending with the 
> notion of "backup workers" as described in the paper.
> We will first aim for distributed synchronous SGD with no backup workers, and 
> then extend this to include backup workers.  The MNIST LeNet model will 
> simply serve as an example, and this same approach can be extended to more 
> recent models, such as resnets.
> \[1]: https://arxiv.org/abs/1604.00981



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1563) Add a distributed synchronous SGD MNIST LeNet example

2017-04-26 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985693#comment-15985693
 ] 

Mike Dusenberry commented on SYSTEMML-1563:
---

cc [~nakul02], [~niketanpansare], [~prithvi_r_s], [~reinwald]

> Add a distributed synchronous SGD MNIST LeNet example
> -
>
> Key: SYSTEMML-1563
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1563
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This aims to add a distributed synchronous SGD MNIST LeNet example.  In 
> distributed synchronous SGD, multiple mini-batches are run forward & backward 
> simultaneously, and the gradients are aggregated together by addition before 
> the model parameters are updated.  This is mathematically equivalent to 
> simply using a large mini-batch size, i.e. {{new_mini_batch_size = 
> mini_batch_size * number_of_parallel_mini_batches}}.  The benefit is that 
> distributed synchronous SGD can make use of multiple devices, i.e. multiple 
> GPUs or multiple CPU machines, and thus can speed up training time.  More 
> specifically, using an effectively larger mini-batch size can yield a more 
> stable gradient in expectation, and a larger number of epochs can be run in 
> the same amount of time, both of which lead to faster convergence.  
> Alternatives include various forms of distributed *asynchronous* SGD, such as 
> Downpour, Hogwild, etc.  However, a recent paper \[1] from Google Brain / 
> Open AI has found evidence supporting the claim that distributed synchronous 
> SGD can lead to faster convergence, particularly if it is extending with the 
> notion of "backup workers" as described in the paper.
> We will first aim for distributed synchronous SGD with no backup workers, and 
> then extend this to include backup workers.  The MNIST LeNet model will 
> simply serve as an example, and this same approach can be extended to more 
> recent models, such as resnets.
> \[1]: https://arxiv.org/abs/1604.00981



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)