[jira] [Resolved] (SYSTEMML-552) Performance features ALS-CG

Matthias Boehm (JIRA) Wed, 09 Mar 2016 10:59:22 -0800

     [ 
https://issues.apache.org/jira/browse/SYSTEMML-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias Boehm resolved SYSTEMML-552.
-------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 0.10

> Performance features ALS-CG
> ---------------------------
>
>                 Key: SYSTEMML-552
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-552
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.10
>
>
> Over a spectrum of data sizes, ALS-CG does not always perform as good as we 
> would expect due to unnecessary overheads. This task captures related 
> performance features:
> 1) Cache-conscious sparse wdivmm left/right: For large factors, the approach 
> of iterating through non-zeros in W and computing dot products, leads to 
> repeated (unnecessary) scans of the factors from main-memory. 
> 2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with 
> !=0, there is already a special case which is however unnecessarily 
> conservative. We should realize this with a plain memcopy of indices and 
> memset 1 for values.
> 3) Flop-aware operator selection QuaternaryOp: For large ranks, all 
> quaternary operators become really compute-intensive. In these situations, 
> our heuristic of choosing ExecType.CP if the operation fits in driver memory 
> does not work very well. Hence, we should take the number of floating point 
> operations and the local/cluster degree of parallelism into account when 
> deciding for the execution type.  
> 4) Improved parallel read sparse binary block: Reading sparse binary block 
> matrices with clen>bclen requires a global lock on append and final 
> sequential sorting of sparse rows. We should use a more fine-grained locking 
> scheme and sort sparse rows in parallel. 
> 5) Cache-conscious sparse wsloss all patterns: Similar to wdivmm (see 1) but 
> less common since only executed once per outer iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-552) Performance features ALS-CG

Reply via email to