[ https://issues.apache.org/jira/browse/SYSTEMML-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm resolved SYSTEMML-552. ------------------------------------- Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 0.10 > Performance features ALS-CG > --------------------------- > > Key: SYSTEMML-552 > URL: https://issues.apache.org/jira/browse/SYSTEMML-552 > Project: SystemML > Issue Type: Task > Reporter: Matthias Boehm > Assignee: Matthias Boehm > Fix For: SystemML 0.10 > > > Over a spectrum of data sizes, ALS-CG does not always perform as good as we > would expect due to unnecessary overheads. This task captures related > performance features: > 1) Cache-conscious sparse wdivmm left/right: For large factors, the approach > of iterating through non-zeros in W and computing dot products, leads to > repeated (unnecessary) scans of the factors from main-memory. > 2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with > !=0, there is already a special case which is however unnecessarily > conservative. We should realize this with a plain memcopy of indices and > memset 1 for values. > 3) Flop-aware operator selection QuaternaryOp: For large ranks, all > quaternary operators become really compute-intensive. In these situations, > our heuristic of choosing ExecType.CP if the operation fits in driver memory > does not work very well. Hence, we should take the number of floating point > operations and the local/cluster degree of parallelism into account when > deciding for the execution type. > 4) Improved parallel read sparse binary block: Reading sparse binary block > matrices with clen>bclen requires a global lock on append and final > sequential sorting of sparse rows. We should use a more fine-grained locking > scheme and sort sparse rows in parallel. > 5) Cache-conscious sparse wsloss all patterns: Similar to wdivmm (see 1) but > less common since only executed once per outer iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)