Frank McQuillan created MADLIB-1431: ---------------------------------------
Summary: DL - improve speed of evaluate for multiple model training Key: MADLIB-1431 URL: https://issues.apache.org/jira/browse/MADLIB-1431 Project: Apache MADlib Issue Type: Improvement Components: Deep Learning Reporter: Frank McQuillan Fix For: v1.18.0 All we have right now is evaluate() for a single model, we have no evaluate_multiple_model() that can run in parallel like fit_multiple_model() does. Currently, the evaluate stage of fit_multiple_model() looks like this: - foreach mst_key: 1. Send weights from segments to master for model corresponding to mst_key (this step alone takes about 7s for each model). 2. Run keras.evaluate() on this model, while all other models wait their turn. Two things stand out: We should not be transferring weights from segments to master. This is slow and unnecessary; let's run evaluate on the segment host where the weights reside already. We should not be running evaluate sequentially on master's GPU one model at a time, while all GPU's on all segments sit idle. For a test environment with 20 segments, creating a evaluate_multiple_model() feature which evaluates models on the segments in parallel will easily give us a > 20x speedup (20x just for fixing #2, plus a significant additional speedup from fixing #1) -- This message was sent by Atlassian Jira (v8.3.4#803005)