[ https://issues.apache.org/jira/browse/SYSTEMML-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm resolved SYSTEMML-1623. -------------------------------------- Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 1.0 > Memory efficiency JMLC matrix and frame conversions > --------------------------------------------------- > > Key: SYSTEMML-1623 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1623 > Project: SystemML > Issue Type: Bug > Reporter: Matthias Boehm > Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > The current JMLC conversion functions cause a very inefficient and memory > intensive code path with leads to unnecessary OOMs that can be easily > avoided. This task aims to add and improve these primitives to allow > convenient data conversions with much better memory efficiency. > For example consider a scenario of a 500k x 90 input model available as csv > file in the classpath, which string representation requires 1GB. The typical > codepath currently use looks as follows: > {code} > ResourceStream(model_file) > -> prep > ---> StringBuilder -> String [3GB tmp, 1GB] > -> convertToDoubleMatrix > ---> byte[] -> ByteInputStream [2GB] > ---> MatrixBlock [360MB] > ---> double[][] [400MB] > -> setMatrix > ---> MatrixBlock [360MB] > {code} > which requires at least 4GB of memory due to strong references to all > intermediates. The goal of this task is to reduce this to the following, > which only requires 360MB of memory: > {code} > ResourceStream(model_file) > -> convertToMatrix > ---> MatrixBlock [360MB] > -> setMatrix > ---> by references > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)