Vincent created SPARK-21058: ------------------------------- Summary: potential SVD optimization Key: SPARK-21058 URL: https://issues.apache.org/jira/browse/SPARK-21058 Project: Spark Issue Type: Improvement Components: ML, MLlib Affects Versions: 2.1.1 Reporter: Vincent
In the current implementation, computeSVD will compute SVD for matrix A by computing AT*A first and svd on the Gramian matrix, we found that the Gramian matrix computation is the hot spot of the overall SVD computation. While svd on the Gramian matrix can benefit svd computation on the skinny matrix, for a non-skinny matrix, it could also become a huge overhead. So, is it possible to offer another option by computing svd on the original matrix instead of the Gramian matrix? We can decide which way to go by the ratio between numCols and numRows, or by simply settings from the user. We have observed a handsome gain on a toy dataset by svd on the original matrix instead of the Gramian matrix, if the proposal is acceptable, we will start to work on the patch and gather more performance data. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org