[ 
https://issues.apache.org/jira/browse/SYSTEMML-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1554:
--------------------------------------
    Fix Version/s: SystemML 1.0

> IPA Scalar Transient Read Replacement
> -------------------------------------
>
>                 Key: SYSTEMML-1554
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1554
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>             Fix For: SystemML 1.0
>
>         Attachments: convnet_distrib_sgd.dml, parfor_oom_convnet_plan.txt, 
> parfor_oom_convnet.py, parfor_oom_plan.txt, parfor_oom.py
>
>
> Currently, during IPA we collect all variables (scalars & matrices) eligible 
> for propagation across blocks (i.e. not updated in block), and then propagate 
> the only the matrix sizes across the blocks.  It seems plausible that we 
> could also replace all eligible scalar transient reads with literals based on 
> the variables that have already been collected.  The benefit is that many ops 
> will be able to determine their respective output sizes during regular 
> compilation, instead of having to wait until dynamic recompilation, and thus 
> we can reduce the pressure on dynamic recompilation.
> Are there drawbacks to this approach?  The use case is that I was seeing a 
> large number of memory warnings while training a convolutional net due to the 
> sizes being unknown during regular compilation, yet the engine only having CP 
> versions of the ops.  Additionally, I was running into actual heap space OOM 
> errors for situations that should not run out of memory, and thus I started 
> exploring.
> I've attached an example script and the explain plan (hops & runtime) w/ and 
> w/o the IPA scalar replacement.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to