[ https://issues.apache.org/jira/browse/SYSTEMML-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Glenn Weidner updated SYSTEMML-1837: ------------------------------------ Fix Version/s: (was: SystemML 1.0) SystemML 0.15 > Unary aggregate w/ corrections output to large physical blocks > -------------------------------------------------------------- > > Key: SYSTEMML-1837 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1837 > Project: SystemML > Issue Type: Bug > Reporter: Matthias Boehm > Assignee: Matthias Boehm > Fix For: SystemML 0.15 > > > Many unary aggregate operations store corrections in additional columns or > rows. For example, {{rowSums(X)}} uses a two-column output to store sums and > corrections. In CP, we drop these corrections immediately after the > operations, while in MR and Spark these corrections are dropped after final > aggregation. The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does > not actually drop the correction but simply shifts all values in the right > starting positions. Hence, the physical output is actually larger than what > the memory estimates represent. This leads to unnecessary large memory > consumption during subsequent operations and in the buffer pool, which can > lead to OOMs. This task aims to fix {{MatrixBlock::dropLastRowsOrColums}}. > In a subsequent task, we could also modify all unary aggregates to never > allocate the multi-column/row output when executed in CP. However, this > requires custom code paths for the different backends. -- This message was sent by Atlassian JIRA (v6.4.14#64029)