[jira] [Created] (MADLIB-1431) DL - improve speed of evaluate for multiple model training

Frank McQuillan (Jira) Wed, 20 May 2020 11:21:20 -0700

Frank McQuillan created MADLIB-1431:
---------------------------------------


             Summary: DL - improve speed of evaluate for multiple model training
                 Key: MADLIB-1431
                 URL: https://issues.apache.org/jira/browse/MADLIB-1431
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Deep Learning
            Reporter: Frank McQuillan
             Fix For: v1.18.0


All we have right now is evaluate() for a single model, we have no 
evaluate_multiple_model() that can run in parallel like fit_multiple_model() 
does.

Currently, the evaluate stage of fit_multiple_model() looks like this:

- foreach mst_key:
       1. Send weights from segments to master for model corresponding to 
mst_key (this step alone takes about 7s for each model).
       2. Run keras.evaluate() on this model, while all other models wait their 
turn.

Two things stand out:

We should not be transferring weights from segments to master. This is slow and 
unnecessary; let's run evaluate on the segment host where the weights reside 
already.
We should not be running evaluate sequentially on master's GPU one model at a 
time, while all GPU's on all segments sit idle.
For a test environment with 20 segments, creating a evaluate_multiple_model() 
feature which evaluates models on the segments in parallel will easily give us 
a > 20x speedup (20x just for fixing #2, plus a significant additional speedup 
from fixing #1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (MADLIB-1431) DL - improve speed of evaluate for multiple model training

Reply via email to