Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
Test results with existing PCA and using SVD without computing covariance
matrix
val data = Array(
Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))
1) PCA using covariance matrix
explained Variance = [ 0.7943932532, 0.2056067468, 1.26E-16]
Top 2 Principle components :
[[-0.44859172075072673 -0.28423808214073987
0.13301985745398526 -0.05621155904253121
-0.1252315635978212 0.7636264774662965
0.21650756651919933 -0.5652958773533949
-0.8476512931126826 -0.11560340501314653 ]]
2) PCA using SVD, without computing covariance matrix:
explained Variance = [0.7943932532, 0.2056067468, 5.55E-17]
Top 2 Principle components :
[[-0.44859172075072673 -0.2842380821407399
0.13301985745398529 -0.056211559042531424
-0.12523156359782125 0.7636264774662964
0.21650756651919945 -0.5652958773533953
-0.8476512931126826 -0.11560340501314664]]
**Leading Eigen Values MSE = 0.0
Leading eigen vectors MSE = 0.0**
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]