[ https://issues.apache.org/jira/browse/SYSTEMML-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Dusenberry updated SYSTEMML-1554: -------------------------------------- Fix Version/s: SystemML 1.0 > IPA Scalar Transient Read Replacement > ------------------------------------- > > Key: SYSTEMML-1554 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1554 > Project: SystemML > Issue Type: Improvement > Reporter: Mike Dusenberry > Assignee: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: convnet_distrib_sgd.dml, parfor_oom_convnet_plan.txt, > parfor_oom_convnet.py, parfor_oom_plan.txt, parfor_oom.py > > > Currently, during IPA we collect all variables (scalars & matrices) eligible > for propagation across blocks (i.e. not updated in block), and then propagate > the only the matrix sizes across the blocks. It seems plausible that we > could also replace all eligible scalar transient reads with literals based on > the variables that have already been collected. The benefit is that many ops > will be able to determine their respective output sizes during regular > compilation, instead of having to wait until dynamic recompilation, and thus > we can reduce the pressure on dynamic recompilation. > Are there drawbacks to this approach? The use case is that I was seeing a > large number of memory warnings while training a convolutional net due to the > sizes being unknown during regular compilation, yet the engine only having CP > versions of the ops. Additionally, I was running into actual heap space OOM > errors for situations that should not run out of memory, and thus I started > exploring. > I've attached an example script and the explain plan (hops & runtime) w/ and > w/o the IPA scalar replacement. -- This message was sent by Atlassian JIRA (v6.3.15#6346)