Hi Stefan, thats really interesting - I never though of trying to benchmark Linux-64 against OSX (a friend who works on large databases, says OSX performs better than Linux in his work!). Thanks for posting your comparison, and your hints :)
i) I guess you have a very fast CPU (Core i7 or so, I guess?), - only quad core i5 but I'm trying to get access to a quad core i7, might make a difference for openCL code? ii) a very poor BLAS implementation - I installed the latest ATLAS package for Ubuntu 10.04 LTS, which gives a x6 speed up?? I'm tempted & interested in recompiling R-2.12.2 linked to the MKL (which I guess the vecLib BLAS library uses ?), but it seems a tricky thing to do ?? To be honest I'm not sure how this new ATLAS library works, i.e. is it seqential or mulithtreaded? iii) and a desktop graphics card - installed a GTX570 today which has 480 cuda cores, my previous card had 16 cores and half the bandwidth The results of a setup with the new ATLAS library and GTX570 are a pleasant improvement :). user system elapsed -- for loop, single thread 29.790 7.400 37.243 user system elapsed -- new ATLAS, t(X)%*%X 1.480 0.000 1.479 user system elapsed -- new ATLAS, crossprod(X) 0.740 0.000 0.739 user system elapsed -- new GPU, gputools::crossprod(X) * 0.190 0.040 0.228* I would be really interested to find out what the results would be on a OSX machine with a fancy GPU. I read that a 2x512 core card is going to be released by Nvidia in the next couple of weeks, and CUDA 4.0 is due for public release in a few months. So may be you want to keep CUDA on your radar? I managed to write my first R function/package using CUDA code at the weekend. Its a fairly simple but tedious process once you have some CUDA code which compiles, and all you want to do is to port it to R. (in the Unix case at least). For example you can write a simple C wrapper along the lines of the rinterface.c code in gputools. Then modify the Makefile.in and configure.ac files in this package as required, and you should be set to configure, make and install into R. I'm working on non-parametric regression, and optimization at the moment and the speed up using CUDA has been worth the effort :) All the best, Ajay On 15 March 2011 11:22, Stefan Evert-3 [via R] < ml-node+3356302-1299160144-215...@n4.nabble.com> wrote: > Hi Ajay, > > thanks for this comparison, which prodded me to give CUDA another try on my > now somewhat aging MacBook Pro. > > > Hi Dennis, sorry for the delayed reply and thanks for the article. I > digged > > into it and found that if you have a GPU, the CUBLAS library beats the > > BLAS/ATLAS implementation in the Matrix package for 'large' problems. > > I guess you have a very fast CPU (Core i7 or so, I guess?), a very poor > BLAS implementation and a desktop graphics card? > > > user system elapsed -- for loop, single thread > > 27.210 6.680 33.342 > > user system elapsed -- BLAS mat mult > > 6.260 0.000 5.982 > > user system elapsed -- BLAS crossprod > > 4.340 0.000 4.284 > > user system elapsed -- CUDA gpuCrossprod > > 1.49 0.00 1.48 > > Just to put these numbers in perspective, here are my results for a MacBook > Pro running Mac OS X 10.6.6 (Core 2 Duo, 2.5 GHz, 6 GB DDR2 RAM, Nvidia > GeForce 8600M GT with 512 MB RAM -- I suppose it's the "M" that breaks my > performance here). > > > user system elapsed -- for loop, single thread > > 141.034 35.299 153.783 > > user system elapsed -- BLAS mat mult > > 2.791 0.025 1.805 > > user system elapsed -- BLAS crossprod > > 1.419 0.039 0.863 > > user system elapsed -- CUDA gpuCrossprod > > 1.431 0.119 1.718 > > > As you can see, my CPU/RAM is about 5x slower than your machine, CUDA is > slightly slower (my card has 32 cores, but may have lower memory bandwidth > and/or clock rate if yours is a desktop card), but vecLib BLAS beats CUDA by > a factor of 2. > > > Kudos to the gputools developers: despite what the README says, the package > compiles out of the box on Mac OS X 10.6, 64-bit R 2.12.1, with CUDA release > 3.2. Thanks for this convenient package! > > > Best regards, > Stefan Evert > > [ [hidden > email]<http://user/SendEmail.jtp?type=node&node=3356302&i=0&by-user=t>| > http://purl.org/stefan.evert ] > > ______________________________________________ > [hidden > email]<http://user/SendEmail.jtp?type=node&node=3356302&i=1&by-user=t>mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3356302.html > To unsubscribe from Speed up sum of outer products?, click > here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3330160&code=YWpheXRhbGF0aUBnb29nbGVtYWlsLmNvbXwzMzMwMTYwfC0zNjU4ODgwNDc=>. > > -- View this message in context: http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3382639.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.