[ 
https://issues.apache.org/jira/browse/SPARK-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1390.
------------------------------------

    Resolution: Fixed

> Refactor RDD backed matrices
> ----------------------------
>
>                 Key: SPARK-1390
>                 URL: https://issues.apache.org/jira/browse/SPARK-1390
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> The current interfaces of RDD backed matrices needs refactoring for v1.0 
> release. It would be better if we have a clear separation of local matrices 
> and those backed by RDD. Right now, we have 
> 1. org.apache.spark.mllib.linalg.SparseMatrix, which is a wrapper over an RDD 
> of matrix entries, i.e., coordinate list format.
> 2. org.apache.spark.mllib.linalg.TallSkinnyDenseMatrix, which is a wrapper 
> over RDD[Array[Double]], i.e. row-oriented format.
> We will see naming collision when we introduce local SparseMatrix and the 
> name TallSkinnyDenseMatrix is not exact if we switch to RDD[Vector] instead 
> of RDD[Array[Double]]. It would be better to have "RDD" in the type name to 
> suggest that operations will trigger a job.
> The proposed names (all under org.apache.spark.mllib.linalg.rdd):
> 1. RDDMatrix: trait for matrices backed by one or more RDDs
> 2. CoordinateRDDMatrix: wrapper of RDD[RDDMatrixEntry]
> 3. RowRDDMatrix: wrapper of RDD[Vector] whose rows do not have special 
> ordering
> 4. IndexedRowRDDMatrix: wrapper of RDD[(Long, Vector)] whose rows are 
> associated with indices
> The proposal is subject to charge, but it would be nice to make the changes 
> before v1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to