[jira] [Closed] (MADLIB-1431) DL - improve speed of evaluate for multiple model training

Frank McQuillan (Jira) Thu, 02 Jul 2020 13:39:12 -0700


     [ 
https://issues.apache.org/jira/browse/MADLIB-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Frank McQuillan closed MADLIB-1431.
-----------------------------------
    Resolution: Fixed

https://github.com/apache/madlib/pull/502

> DL - improve speed of evaluate for multiple model training
> ----------------------------------------------------------
>
>                 Key: MADLIB-1431
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1431
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.18.0
>
>
> All we have right now is evaluate() for a single model, we have no 
> evaluate_multiple_model() that can run in parallel like fit_multiple_model() 
> does.
> Currently, the evaluate stage of fit_multiple_model() looks like this:
> - foreach mst_key:
>        1. Send weights from segments to master for model corresponding to 
> mst_key (this step alone takes about 7s for each model).
>        2. Run keras.evaluate() on this model, while all other models wait 
> their turn.
> Two things stand out:
> We should not be transferring weights from segments to master. This is slow 
> and unnecessary; let's run evaluate on the segment host where the weights 
> reside already.
> We should not be running evaluate sequentially on master's GPU one model at a 
> time, while all GPU's on all segments sit idle.
> For a test environment with 20 segments, creating a evaluate_multiple_model() 
> feature which evaluates models on the segments in parallel will easily give 
> us a > 20x speedup (20x just for fixing #2, plus a significant additional 
> speedup from fixing #1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (MADLIB-1431) DL - improve speed of evaluate for multiple model training

Reply via email to