[ https://issues.apache.org/jira/browse/SPARK-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-1390. ------------------------------------ Resolution: Fixed > Refactor RDD backed matrices > ---------------------------- > > Key: SPARK-1390 > URL: https://issues.apache.org/jira/browse/SPARK-1390 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > Priority: Blocker > Fix For: 1.0.0 > > > The current interfaces of RDD backed matrices needs refactoring for v1.0 > release. It would be better if we have a clear separation of local matrices > and those backed by RDD. Right now, we have > 1. org.apache.spark.mllib.linalg.SparseMatrix, which is a wrapper over an RDD > of matrix entries, i.e., coordinate list format. > 2. org.apache.spark.mllib.linalg.TallSkinnyDenseMatrix, which is a wrapper > over RDD[Array[Double]], i.e. row-oriented format. > We will see naming collision when we introduce local SparseMatrix and the > name TallSkinnyDenseMatrix is not exact if we switch to RDD[Vector] instead > of RDD[Array[Double]]. It would be better to have "RDD" in the type name to > suggest that operations will trigger a job. > The proposed names (all under org.apache.spark.mllib.linalg.rdd): > 1. RDDMatrix: trait for matrices backed by one or more RDDs > 2. CoordinateRDDMatrix: wrapper of RDD[RDDMatrixEntry] > 3. RowRDDMatrix: wrapper of RDD[Vector] whose rows do not have special > ordering > 4. IndexedRowRDDMatrix: wrapper of RDD[(Long, Vector)] whose rows are > associated with indices > The proposal is subject to charge, but it would be nice to make the changes > before v1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)