[jira] [Updated] (SYSTEMML-1760) Improve engine robustness of distributed SGD training

Fei Hu (JIRA) Fri, 28 Jul 2017 10:17:16 -0700

     [ 
https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fei Hu updated SYSTEMML-1760:
-----------------------------
    Attachment: Runtime_Table.png

> Improve engine robustness of distributed SGD training
> -----------------------------------------------------
>
>                 Key: SYSTEMML-1760
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1760
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>            Reporter: Mike Dusenberry
>            Assignee: Fei Hu
>         Attachments: Runtime_Table.png
>
>
> Currently, we have a mathematical framework in place for training with 
> distributed SGD in a [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml].
>   This task aims to push this at scale to determine (1) the current behavior 
> of the engine (i.e. does the optimizer actually run this in a distributed 
> fashion, and (2) ways to improve the robustness and performance for this 
> scenario.  The distributed SGD framework from this example has already been 
> ported into Caffe2DML, and thus improvements made for this task will directly 
> benefit our efforts towards distributed training of Caffe models (and Keras 
> in the future).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (SYSTEMML-1760) Improve engine robustness of distributed SGD training

Reply via email to