[ https://issues.apache.org/jira/browse/SPARK-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-4409: --------------------------------- Assignee: Burak Yavuz > Additional (but limited) Linear Algebra Utils > --------------------------------------------- > > Key: SPARK-4409 > URL: https://issues.apache.org/jira/browse/SPARK-4409 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Burak Yavuz > Assignee: Burak Yavuz > Priority: Minor > > This ticket is to discuss the addition of a very limited number of local > matrix manipulation and generation methods that would be helpful in the > further development for algorithms on top of BlockMatrix (SPARK-3974), such > as Randomized SVD, and Multi Model Training (SPARK-1486). > The proposed methods for addition are: > For `Matrix` > - map: maps the values in the matrix with a given function. Produces a new > matrix. > - update: the values in the matrix are updated with a given function. > Occurs in place. > Factory methods for `DenseMatrix`: > - *zeros: Generate a matrix consisting of zeros > - *ones: Generate a matrix consisting of ones > - *eye: Generate an identity matrix > - *rand: Generate a matrix consisting of i.i.d. uniform random numbers > - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers > - *diag: Generate a diagonal matrix from a supplied vector > *These methods already exist in the factory methods for `Matrices`, however > for cases where we require a `DenseMatrix`, you constantly have to add > `.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I > propose moving these functions to factory methods for `DenseMatrix` where the > putput will be a `DenseMatrix` and the factory methods for `Matrices` will > call these functions directly and output a generic `Matrix`. > Factory methods for `SparseMatrix`: > - speye: Identity matrix in sparse format. Saves a ton of memory when > dimensions are large, especially in Multi Model Training, where each row > requires being multiplied by a scalar. > - sprand: Generate a sparse matrix with a given density consisting of > i.i.d. uniform random numbers. > - sprandn: Generate a sparse matrix with a given density consisting of > i.i.d. gaussian random numbers. > - diag: Generate a diagonal matrix from a supplied vector, but is memory > efficient, because it just stores the diagonal. Again, very helpful in Multi > Model Training. > Factory methods for `Matrices`: > - Include all the factory methods given above, but return a generic > `Matrix` rather than `SparseMatrix` or `DenseMatrix`. > - horzCat: Horizontally concatenate matrices to form one larger matrix. > Very useful in both Multi Model Training, and for the repartitioning of > BlockMatrix. > - vertCat: Vertically concatenate matrices to form one larger matrix. Very > useful for the repartitioning of BlockMatrix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org