GitHub user vrilleup opened a pull request: https://github.com/apache/spark/pull/1378
use specialized axpy in RowMatrix for SVD After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrilleup/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1378.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1378 ---- commit e1db950e91c7d9526519626aa252cd711307d857 Author: Li Pu <l...@twitter.com> Date: 2014-06-04T01:05:18Z SPARK-1782: svd for sparse matrix using ARPACK copy ARPACK dsaupd/dseupd code from latest breeze change RowMatrix to use sparse SVD change tests for sparse SVD commit 96d2ecb837843651db70d7505ddb73cfc0b0bf9a Author: Li Pu <l...@twitter.com> Date: 2014-06-04T06:03:35Z improve eigenvalue sorting commit fe983b0e7d62359275a92c2adaae8a635d7dd5d8 Author: Li Pu <l...@twitter.com> Date: 2014-06-04T07:01:29Z improve scala style commit 9c8051594a88b53ce83b39b127a098b31bd89aad Author: Li Pu <l...@twitter.com> Date: 2014-06-04T08:25:58Z use non-sparse implementation when k = n commit 827411b7a7c7a44ec9cf0a3a3439bba0a47575f7 Author: Li Pu <l...@twitter.com> Date: 2014-06-04T08:29:12Z fix EOF new line commit e7850ed465ceadd6a45132935013292a4845f8df Author: Li Pu <l...@twitter.com> Date: 2014-06-04T23:56:26Z use aggregate and axpy commit 4c7aec3d1c5203b4825047c66bed718211f9446c Author: Li Pu <l...@twitter.com> Date: 2014-06-07T01:33:47Z improve comments commit eb15100052aae878552aa437c41e548243a6a29e Author: Li Pu <l...@twitter.com> Date: 2014-06-13T06:36:18Z fix binary compatibility commit 819824b85acfc8ace9c15e0a9c5ce317604e4f73 Author: Li Pu <l...@twitter.com> Date: 2014-06-18T02:11:53Z add flag for dense svd or sparse svd commit 5543cce3b7eba1bb3c4b5b8b43ca2c0399295044 Author: Li Pu <l...@twitter.com> Date: 2014-06-23T23:27:27Z improve svd api commit 71484263409c03669be825b50714731fa9c46f6c Author: Li Pu <l...@twitter.com> Date: 2014-06-26T07:09:48Z improve RowMatrix multiply commit c2737714b696d3cfae3b1efd0bde6a8d44a47b95 Author: Li Pu <l...@twitter.com> Date: 2014-07-07T20:49:29Z automatically determine SVD compute mode and parameters commit 62969fa4e06a715025483ed282b29427075bbbf1 Author: Xiangrui Meng <m...@databricks.com> Date: 2014-07-09T00:54:54Z use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs commit 861ec48bc74616b47d45ad3b828097a35045050f Author: Xiangrui Meng <m...@databricks.com> Date: 2014-07-09T01:09:23Z simplify axpy commit a461082d98828501eccfbb59c8813c5fbd2ef826 Author: Xiangrui Meng <m...@databricks.com> Date: 2014-07-09T01:43:18Z make superscript show up correctly in doc commit 4c618e917607b6d760f6192878173198399302c1 Author: Li Pu <li...@outlook.com> Date: 2014-07-09T07:10:14Z Merge pull request #1 from mengxr/vrilleup-master Some updates to SVD impl commit 7312ec10b1be13a41e46c4b8d164302c8497514a Author: Li Pu <l...@twitter.com> Date: 2014-07-09T07:35:20Z very minor comment fix commit 5255f2a23ae979dcf809034bba658491ab8fd72a Author: Li Pu <l...@twitter.com> Date: 2014-07-10T18:53:06Z Merge remote-tracking branch 'upstream/master' commit 6fb01a31ad967b849f5b738f22a64f8616d3177b Author: Li Pu <l...@twitter.com> Date: 2014-07-11T23:12:43Z use specialized axpy in RowMatrix ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---