[ https://issues.apache.org/jira/browse/SPARK-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140160#comment-14140160 ]
Gaurav Mishra commented on SPARK-3434: -------------------------------------- A matrix being represented by multiple RDDs of sub-matrices may be helpful when an operation on the matrix requires computation over only a small set of its sub-matrices. However, operations like matrix multiplication require computation over all elements in the matrix (i.e. all elements need to be read). Therefore, at least in the case of matrix multiplication, keeping a single RDD seems to be a better idea. Keeping multiple RDDs in that case will only burden us further with the task of keeping track of all sub matrices. > Distributed block matrix > ------------------------ > > Key: SPARK-3434 > URL: https://issues.apache.org/jira/browse/SPARK-3434 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Xiangrui Meng > > This JIRA is for discussing distributed matrices stored in block > sub-matrices. The main challenge is the partitioning scheme to allow adding > linear algebra operations in the future, e.g.: > 1. matrix multiplication > 2. matrix factorization (QR, LU, ...) > Let's discuss the partitioning and storage and how they fit into the above > use cases. > Questions: > 1. Should it be backed by a single RDD that contains all of the sub-matrices > or many RDDs with each contains only one sub-matrix? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org