Re: [R] Speed up sum of outer products?

Stefan Evert Tue, 15 Mar 2011 04:22:28 -0700

Hi Ajay,

thanks for this comparison, which prodded me to give CUDA another try on my now 
somewhat aging MacBook Pro.


> Hi Dennis, sorry for the delayed reply and thanks for the article. I digged
> into it and found that if you have a GPU, the CUBLAS library beats the
> BLAS/ATLAS implementation in the Matrix package for 'large' problems.

I guess you have a very fast CPU (Core i7 or so, I guess?), a very poor BLAS 
implementation and a desktop graphics card?

>   user  system elapsed    -- for loop, single thread
> 27.210   6.680  33.342 
>   user  system elapsed    -- BLAS mat mult
>  6.260   0.000   5.982 
>   user  system elapsed    -- BLAS crossprod
>  4.340   0.000   4.284 
>   user  system elapsed    -- CUDA gpuCrossprod
>   1.49    0.00    1.48 

Just to put these numbers in perspective, here are my results for a MacBook Pro 
running Mac OS X 10.6.6 (Core 2 Duo, 2.5 GHz, 6 GB DDR2 RAM, Nvidia GeForce 
8600M GT with 512 MB RAM -- I suppose it's the "M" that breaks my performance 
here).

>    user  system elapsed    -- for loop, single thread 
> 141.034  35.299 153.783 
>    user  system elapsed    -- BLAS mat mult
>   2.791   0.025   1.805 
>    user  system elapsed    -- BLAS crossprod
>   1.419   0.039   0.863 
>    user  system elapsed    -- CUDA gpuCrossprod
>   1.431   0.119   1.718 


As you can see, my CPU/RAM is about 5x slower than your machine, CUDA is 
slightly slower (my card has 32 cores, but may have lower memory bandwidth 
and/or clock rate if yours is a desktop card), but vecLib BLAS beats CUDA by a 
factor of 2.


Kudos to the gputools developers: despite what the README says, the package 
compiles out of the box on Mac OS X 10.6, 64-bit R 2.12.1, with CUDA release 
3.2.  Thanks for this convenient package!


Best regards,
Stefan Evert

[ stefan.ev...@uos.de | http://purl.org/stefan.evert ]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed up sum of outer products?

Reply via email to