This is an automated email from the ASF dual-hosted git repository.

mboehm7 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/main by this push:
     new c8bf90bf34 [MINOR] Improve sparse-dense matrix-vector multiplication
c8bf90bf34 is described below

commit c8bf90bf34c87a1bcdd89c3056a5af7f11bab2e9
Author: Matthias Boehm <[email protected]>
AuthorDate: Thu Apr 4 16:17:27 2024 +0200

    [MINOR] Improve sparse-dense matrix-vector multiplication
    
    For sparse-dense matrix-vector multiplications with large rhs vectors
    we use a dedicated kernel with cache blocking. The block sizes are
    derived from the sparsity in order to avoid expensive block handling
    for very sparse lhs matrices. For a recent case of ultra-sparse graphs
    (europe_osm with dims: 50912018-x-50912018, nnz: 108109320, which
    is a sparsity of 4.1e-8) the chosen blocksize was larger than the
    entire matrix. We now detect these cases, and fall back to the plain
    sparse-dense matrix-vector kernel. For above scenario, the runtime
    improved from ~170ms to ~155ms after tiered JIT compilation.
---
 .../org/apache/sysds/runtime/matrix/data/LibMatrixMult.java | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git 
a/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java 
b/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
index d71b6d479f..9a7c01dee2 100644
--- a/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
+++ b/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
@@ -1400,11 +1400,18 @@ public class LibMatrixMult
        }
        
        private static void matrixMultSparseDenseMVTallRHS(SparseBlock a, 
DenseBlock b, DenseBlock c, int cd, long xsp, int rl, int ru) {
-               double[] bvals = b.valuesAt(0);
-               double[] cvals = c.valuesAt(0);
-               
                final int blocksizeI = 512; //8KB curk+cvals in L1
                final int blocksizeK = (int)Math.max(2048,2048*xsp/32); 
//~256KB bvals in L2
+               
+               //short-cut to kernel w/o cache blocking if no benefit
+               if( blocksizeK >= cd ) {
+                       matrixMultSparseDenseMVShortRHS(a, b, c, cd, rl, ru);
+                       return;
+               }
+               
+               //sparse matrix-vector w/ cache blocking (keep front of 
positions)
+               double[] bvals = b.valuesAt(0);
+               double[] cvals = c.valuesAt(0);
                int[] curk = new int[blocksizeI];
                
                for( int bi = rl; bi < ru; bi+=blocksizeI ) {

Reply via email to