[ https://issues.apache.org/jira/browse/MADLIB-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan closed MADLIB-1431. ----------------------------------- Resolution: Fixed https://github.com/apache/madlib/pull/502 > DL - improve speed of evaluate for multiple model training > ---------------------------------------------------------- > > Key: MADLIB-1431 > URL: https://issues.apache.org/jira/browse/MADLIB-1431 > Project: Apache MADlib > Issue Type: Improvement > Components: Deep Learning > Reporter: Frank McQuillan > Priority: Minor > Fix For: v1.18.0 > > > All we have right now is evaluate() for a single model, we have no > evaluate_multiple_model() that can run in parallel like fit_multiple_model() > does. > Currently, the evaluate stage of fit_multiple_model() looks like this: > - foreach mst_key: > 1. Send weights from segments to master for model corresponding to > mst_key (this step alone takes about 7s for each model). > 2. Run keras.evaluate() on this model, while all other models wait > their turn. > Two things stand out: > We should not be transferring weights from segments to master. This is slow > and unnecessary; let's run evaluate on the segment host where the weights > reside already. > We should not be running evaluate sequentially on master's GPU one model at a > time, while all GPU's on all segments sit idle. > For a test environment with 20 segments, creating a evaluate_multiple_model() > feature which evaluates models on the segments in parallel will easily give > us a > 20x speedup (20x just for fixing #2, plus a significant additional > speedup from fixing #1) -- This message was sent by Atlassian Jira (v8.3.4#803005)