This is an automated email from the ASF dual-hosted git repository.
mboehm7 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/main by this push:
new c8bf90bf34 [MINOR] Improve sparse-dense matrix-vector multiplication
c8bf90bf34 is described below
commit c8bf90bf34c87a1bcdd89c3056a5af7f11bab2e9
Author: Matthias Boehm <[email protected]>
AuthorDate: Thu Apr 4 16:17:27 2024 +0200
[MINOR] Improve sparse-dense matrix-vector multiplication
For sparse-dense matrix-vector multiplications with large rhs vectors
we use a dedicated kernel with cache blocking. The block sizes are
derived from the sparsity in order to avoid expensive block handling
for very sparse lhs matrices. For a recent case of ultra-sparse graphs
(europe_osm with dims: 50912018-x-50912018, nnz: 108109320, which
is a sparsity of 4.1e-8) the chosen blocksize was larger than the
entire matrix. We now detect these cases, and fall back to the plain
sparse-dense matrix-vector kernel. For above scenario, the runtime
improved from ~170ms to ~155ms after tiered JIT compilation.
---
.../org/apache/sysds/runtime/matrix/data/LibMatrixMult.java | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git
a/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
b/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
index d71b6d479f..9a7c01dee2 100644
--- a/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
+++ b/src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixMult.java
@@ -1400,11 +1400,18 @@ public class LibMatrixMult
}
private static void matrixMultSparseDenseMVTallRHS(SparseBlock a,
DenseBlock b, DenseBlock c, int cd, long xsp, int rl, int ru) {
- double[] bvals = b.valuesAt(0);
- double[] cvals = c.valuesAt(0);
-
final int blocksizeI = 512; //8KB curk+cvals in L1
final int blocksizeK = (int)Math.max(2048,2048*xsp/32);
//~256KB bvals in L2
+
+ //short-cut to kernel w/o cache blocking if no benefit
+ if( blocksizeK >= cd ) {
+ matrixMultSparseDenseMVShortRHS(a, b, c, cd, rl, ru);
+ return;
+ }
+
+ //sparse matrix-vector w/ cache blocking (keep front of
positions)
+ double[] bvals = b.valuesAt(0);
+ double[] cvals = c.valuesAt(0);
int[] curk = new int[blocksizeI];
for( int bi = rl; bi < ru; bi+=blocksizeI ) {