Alex Favaro created SPARK-31122: ----------------------------------- Summary: Add support for sparse matrix multiplication Key: SPARK-31122 URL: https://issues.apache.org/jira/browse/SPARK-31122 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 2.4.5 Reporter: Alex Favaro
MLlib does not currently support multiplication of sparse matrices. When multiplying block matrices with sparse blocks, the sparse blocks are first converted to dense matrices. This leads to large increases in memory utilization for certain problems. I'd like to propose adding support for local sparse matrix multiplication to MLlib, as well as local dense-sparse matrix multiplication. With these changes, the case clause which converts sparse blocks to dense matrices in the block matrix multiply method could be removed. One question is whether the result of sparse-sparse matrix multiplication should be sparse or dense, since the product of two sparse matrices can be quite dense depending on the matrices. I propose returning a sparse matrix, however, and letting the application convert the result to a dense matrix if necessary. There is some precedent for this with the block matrix add method, which returns sparse matrix blocks even when adding a sparse matrix block to a dense matrix block. As for the implementation, one option would be to leverage Breeze's existing sparse matrix multiplication, as MLlib currently does for matrix addition. Another would be to add support for sparse-sparse multiplication to the BLAS wrapper, which would be consistent with the sparse-dense multiplication implementation and could support a more efficient routine for transposed matrices (as Breeze does not support transposed matrices). The exact algorithm would follow that laid out in ["Sparse Matrix Multiplication Package (SMMP)"|https://www.i2m.univ-amu.fr/perso/abdallah.bradji/multp_sparse.pdf]. This would likely not be a huge change but would take some time to test and benchmark properly, so before I put up a code diff I would be curious to know: * Is there any interest in supporting this functionality in MLlib? * Is there a preference for the return type of sparse-sparse multiplication? (i.e. sparse or dense) * Is there a preference for the implementation? (Breeze vs a built-in one) Some tickets which include related functionality or identified this particular issue but never solved it: SPARK-16820, SPARK-3418. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org