[jira] [Assigned] (SPARK-13970) Add Non-Negative Matrix Factorization to MLlib

2016-03-19 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13970:


Assignee: Apache Spark

> Add Non-Negative Matrix Factorization to MLlib
> --
>
> Key: SPARK-13970
> URL: https://issues.apache.org/jira/browse/SPARK-13970
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>
> NMF is to find two non-negative matrices (W, H) whose product W * H.T 
> approximates the non-negative matrix X. This factorization can be used for 
> example for dimensionality reduction, source separation or topic extraction.
> NMF was implemented in several packages:
> Scikit-Learn 
> (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html#sklearn.decomposition.NMF)
> R-NMF (https://cran.r-project.org/web/packages/NMF/index.html)
> LibNMF (http://www.univie.ac.at/rlcta/software/)
> I have implemented in MLlib according to the following papers:
> Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data 
> Analysis on MapReduce (http://research.microsoft.com/pubs/119077/DNMF.pdf)
> Algorithms for Non-negative Matrix Factorization 
> (http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf)
> It can be used like this:
> val m = 4
> val n = 3
> val data = Seq(
> (0L, Vectors.dense(0.0, 1.0, 2.0)),
> (1L, Vectors.dense(3.0, 4.0, 5.0)),
> (3L, Vectors.dense(9.0, 0.0, 1.0))
>   ).map(x => IndexedRow(x._1, x._2))
> val A = new IndexedRowMatrix(indexedRows).toCoordinateMatrix()
> val k = 2
> // run the nmf algo
> val r = NMF.solve(A, k, 10)
> val rW = r.W.toBlockMatrix().toLocalMatrix().asInstanceOf[DenseMatrix]
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 1.1349295096806706  1.4423101890626953E-5
> 3.453054133110303   0.46312492493865615
> 0.0 0.0
> 0.3133764134585149  2.70684017255672
> val rH = r.H.toBlockMatrix().toLocalMatrix().asInstanceOf[DenseMatrix]
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 0.4184163313845057  3.2719352525149286
> 1.121880126136450.002939823716977737
> 1.456499371939653   0.18992996116069297
> val R = rW.multiply(rH.transpose)
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 0.4749202332761286  1.2732549038779071.6530268574248572
> 2.9601290106732367  3.8752743120480346   5.117332475154927
> 0.0 0.0  0.0
> 8.987727592773672   0.35952840319637736  0.9705425982249293
> val AD = A.toBlockMatrix().toLocalMatrix()
> >>> org.apache.spark.mllib.linalg.Matrix =
> 0.0  1.0  2.0
> 3.0  4.0  5.0
> 0.0  0.0  0.0
> 9.0  0.0  1.0
> var loss = 0.0
> for(i <- 0 until AD.numRows; j <- 0 until AD.numCols) {
>val diff = AD(i, j) - R(i, j)
>loss += diff * diff
> }
> loss
> >>> Double = 0.5817999580912183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13970) Add Non-Negative Matrix Factorization to MLlib

2016-03-19 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13970:


Assignee: (was: Apache Spark)

> Add Non-Negative Matrix Factorization to MLlib
> --
>
> Key: SPARK-13970
> URL: https://issues.apache.org/jira/browse/SPARK-13970
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: zhengruifeng
>Priority: Minor
>
> NMF is to find two non-negative matrices (W, H) whose product W * H.T 
> approximates the non-negative matrix X. This factorization can be used for 
> example for dimensionality reduction, source separation or topic extraction.
> NMF was implemented in several packages:
> Scikit-Learn 
> (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html#sklearn.decomposition.NMF)
> R-NMF (https://cran.r-project.org/web/packages/NMF/index.html)
> LibNMF (http://www.univie.ac.at/rlcta/software/)
> I have implemented in MLlib according to the following papers:
> Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data 
> Analysis on MapReduce (http://research.microsoft.com/pubs/119077/DNMF.pdf)
> Algorithms for Non-negative Matrix Factorization 
> (http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf)
> It can be used like this:
> val m = 4
> val n = 3
> val data = Seq(
> (0L, Vectors.dense(0.0, 1.0, 2.0)),
> (1L, Vectors.dense(3.0, 4.0, 5.0)),
> (3L, Vectors.dense(9.0, 0.0, 1.0))
>   ).map(x => IndexedRow(x._1, x._2))
> val A = new IndexedRowMatrix(indexedRows).toCoordinateMatrix()
> val k = 2
> // run the nmf algo
> val r = NMF.solve(A, k, 10)
> val rW = r.W.toBlockMatrix().toLocalMatrix().asInstanceOf[DenseMatrix]
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 1.1349295096806706  1.4423101890626953E-5
> 3.453054133110303   0.46312492493865615
> 0.0 0.0
> 0.3133764134585149  2.70684017255672
> val rH = r.H.toBlockMatrix().toLocalMatrix().asInstanceOf[DenseMatrix]
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 0.4184163313845057  3.2719352525149286
> 1.121880126136450.002939823716977737
> 1.456499371939653   0.18992996116069297
> val R = rW.multiply(rH.transpose)
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 0.4749202332761286  1.2732549038779071.6530268574248572
> 2.9601290106732367  3.8752743120480346   5.117332475154927
> 0.0 0.0  0.0
> 8.987727592773672   0.35952840319637736  0.9705425982249293
> val AD = A.toBlockMatrix().toLocalMatrix()
> >>> org.apache.spark.mllib.linalg.Matrix =
> 0.0  1.0  2.0
> 3.0  4.0  5.0
> 0.0  0.0  0.0
> 9.0  0.0  1.0
> var loss = 0.0
> for(i <- 0 until AD.numRows; j <- 0 until AD.numCols) {
>val diff = AD(i, j) - R(i, j)
>loss += diff * diff
> }
> loss
> >>> Double = 0.5817999580912183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org