[ 
https://issues.apache.org/jira/browse/SYSTEMML-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1548:
-------------------------------------
    Description: 
Reading ultra-sparse matrices shows for certain data sizes and memory 
configurations poor performance due to garbage collection overheads.

In detail, this task covers two scenarios that will be addressed independently:

1) Large heap: In case of large heaps, the problem are temporarily deserialized 
sparse blocks which are not reused due to inefficient resent, leading to lots 
of garbage and hence high cost for full garbage collection. This will be 
addressed by using our CSR sparse blocks for ultra-sparse blocks because CSR 
has smaller memory footprint and allows efficient reset.

2) Small heap: In case of small heaps not the temporary blocks but the memory 
overhead of the target sparse matrix becomes the bottleneck. This is due to a 
relatively large memory overhead per sparse row which is not amortized if a 
rows has just one or very few non-zeros. This will be addressed via a 
modification of the MCSR representation for ultra-sparse matrices. Note that we 
cannot use CSR or COO here because we want to support efficient multi-threaded 
incremental construction.

  was:Reading ultra-sparse matrices shows for certain data sizes and memory 
configurations poor performance due to garbage collection overheads.


> Performance ultra-sparse matrix read
> ------------------------------------
>
>                 Key: SYSTEMML-1548
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1548
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>
> Reading ultra-sparse matrices shows for certain data sizes and memory 
> configurations poor performance due to garbage collection overheads.
> In detail, this task covers two scenarios that will be addressed 
> independently:
> 1) Large heap: In case of large heaps, the problem are temporarily 
> deserialized sparse blocks which are not reused due to inefficient resent, 
> leading to lots of garbage and hence high cost for full garbage collection. 
> This will be addressed by using our CSR sparse blocks for ultra-sparse blocks 
> because CSR has smaller memory footprint and allows efficient reset.
> 2) Small heap: In case of small heaps not the temporary blocks but the memory 
> overhead of the target sparse matrix becomes the bottleneck. This is due to a 
> relatively large memory overhead per sparse row which is not amortized if a 
> rows has just one or very few non-zeros. This will be addressed via a 
> modification of the MCSR representation for ultra-sparse matrices. Note that 
> we cannot use CSR or COO here because we want to support efficient 
> multi-threaded incremental construction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to