Hi Ajay, thanks for this comparison, which prodded me to give CUDA another try on my now somewhat aging MacBook Pro.
> Hi Dennis, sorry for the delayed reply and thanks for the article. I digged > into it and found that if you have a GPU, the CUBLAS library beats the > BLAS/ATLAS implementation in the Matrix package for 'large' problems. I guess you have a very fast CPU (Core i7 or so, I guess?), a very poor BLAS implementation and a desktop graphics card? > user system elapsed -- for loop, single thread > 27.210 6.680 33.342 > user system elapsed -- BLAS mat mult > 6.260 0.000 5.982 > user system elapsed -- BLAS crossprod > 4.340 0.000 4.284 > user system elapsed -- CUDA gpuCrossprod > 1.49 0.00 1.48 Just to put these numbers in perspective, here are my results for a MacBook Pro running Mac OS X 10.6.6 (Core 2 Duo, 2.5 GHz, 6 GB DDR2 RAM, Nvidia GeForce 8600M GT with 512 MB RAM -- I suppose it's the "M" that breaks my performance here). > user system elapsed -- for loop, single thread > 141.034 35.299 153.783 > user system elapsed -- BLAS mat mult > 2.791 0.025 1.805 > user system elapsed -- BLAS crossprod > 1.419 0.039 0.863 > user system elapsed -- CUDA gpuCrossprod > 1.431 0.119 1.718 As you can see, my CPU/RAM is about 5x slower than your machine, CUDA is slightly slower (my card has 32 cores, but may have lower memory bandwidth and/or clock rate if yours is a desktop card), but vecLib BLAS beats CUDA by a factor of 2. Kudos to the gputools developers: despite what the README says, the package compiles out of the box on Mac OS X 10.6, 64-bit R 2.12.1, with CUDA release 3.2. Thanks for this convenient package! Best regards, Stefan Evert [ stefan.ev...@uos.de | http://purl.org/stefan.evert ] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.