GitHub user vrilleup opened a pull request:

    https://github.com/apache/spark/pull/1378

    use specialized axpy in RowMatrix for SVD

    After running some more tests on large matrix, found that the BV axpy 
(breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy 
(breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for 
each multiplication. The BV axpy operates on an iterator while BSV axpy 
directly operates on the underlying array. I think the overhead comes from 
creating the iterator (with a zip) and advancing the pointers.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vrilleup/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1378.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1378
    
----
commit e1db950e91c7d9526519626aa252cd711307d857
Author: Li Pu <l...@twitter.com>
Date:   2014-06-04T01:05:18Z

    SPARK-1782: svd for sparse matrix using ARPACK
    
    copy ARPACK dsaupd/dseupd code from latest breeze
    change RowMatrix to use sparse SVD
    change tests for sparse SVD

commit 96d2ecb837843651db70d7505ddb73cfc0b0bf9a
Author: Li Pu <l...@twitter.com>
Date:   2014-06-04T06:03:35Z

    improve eigenvalue sorting

commit fe983b0e7d62359275a92c2adaae8a635d7dd5d8
Author: Li Pu <l...@twitter.com>
Date:   2014-06-04T07:01:29Z

    improve scala style

commit 9c8051594a88b53ce83b39b127a098b31bd89aad
Author: Li Pu <l...@twitter.com>
Date:   2014-06-04T08:25:58Z

    use non-sparse implementation when k = n

commit 827411b7a7c7a44ec9cf0a3a3439bba0a47575f7
Author: Li Pu <l...@twitter.com>
Date:   2014-06-04T08:29:12Z

    fix EOF new line

commit e7850ed465ceadd6a45132935013292a4845f8df
Author: Li Pu <l...@twitter.com>
Date:   2014-06-04T23:56:26Z

    use aggregate and axpy

commit 4c7aec3d1c5203b4825047c66bed718211f9446c
Author: Li Pu <l...@twitter.com>
Date:   2014-06-07T01:33:47Z

    improve comments

commit eb15100052aae878552aa437c41e548243a6a29e
Author: Li Pu <l...@twitter.com>
Date:   2014-06-13T06:36:18Z

    fix binary compatibility

commit 819824b85acfc8ace9c15e0a9c5ce317604e4f73
Author: Li Pu <l...@twitter.com>
Date:   2014-06-18T02:11:53Z

    add flag for dense svd or sparse svd

commit 5543cce3b7eba1bb3c4b5b8b43ca2c0399295044
Author: Li Pu <l...@twitter.com>
Date:   2014-06-23T23:27:27Z

    improve svd api

commit 71484263409c03669be825b50714731fa9c46f6c
Author: Li Pu <l...@twitter.com>
Date:   2014-06-26T07:09:48Z

    improve RowMatrix multiply

commit c2737714b696d3cfae3b1efd0bde6a8d44a47b95
Author: Li Pu <l...@twitter.com>
Date:   2014-07-07T20:49:29Z

    automatically determine SVD compute mode and parameters

commit 62969fa4e06a715025483ed282b29427075bbbf1
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-07-09T00:54:54Z

    use BDV directly in symmetricEigs
    change the computation mode to local-svd, local-eigs, and dist-eigs
    update tests and docs

commit 861ec48bc74616b47d45ad3b828097a35045050f
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-07-09T01:09:23Z

    simplify axpy

commit a461082d98828501eccfbb59c8813c5fbd2ef826
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-07-09T01:43:18Z

    make superscript show up correctly in doc

commit 4c618e917607b6d760f6192878173198399302c1
Author: Li Pu <li...@outlook.com>
Date:   2014-07-09T07:10:14Z

    Merge pull request #1 from mengxr/vrilleup-master
    
    Some updates to SVD impl

commit 7312ec10b1be13a41e46c4b8d164302c8497514a
Author: Li Pu <l...@twitter.com>
Date:   2014-07-09T07:35:20Z

    very minor comment fix

commit 5255f2a23ae979dcf809034bba658491ab8fd72a
Author: Li Pu <l...@twitter.com>
Date:   2014-07-10T18:53:06Z

    Merge remote-tracking branch 'upstream/master'

commit 6fb01a31ad967b849f5b738f22a64f8616d3177b
Author: Li Pu <l...@twitter.com>
Date:   2014-07-11T23:12:43Z

    use specialized axpy in RowMatrix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to