For the record, the next version of R will allow Goto's BLAS to be used under Windows for real (but not complex) calculations.
For fair timings, you want to run gc() before starting the run, or you end up paying to clear up the current state of the session. That is a large part of the difference in the first run. Another issue is that the first call to LAPACK in a session (not used here) has to load the DLL. On 17 Feb 2004, Douglas Bates wrote: > Spencer Graves <[EMAIL PROTECTED]> writes: > > > One can also use "crossprod" AND use "solve" to actually "solve" > > the system of linear equations rather than just get the inverse, > > which is later multiplied by t(BA)%*%D. However, the difference > > seems very small: > > Thanks for pointing that out Spencer. I was about to do the same. > > > set.seed(1) > > > > > n <- 500 > > > A <- array(rnorm(n^2), dim=c(n,n)) > > > B <- array(rnorm(n^2), dim=c(n,n)) > > > C. <- array(rnorm(n^2), dim=c(n,n)) > > > D <- array(rnorm(n^2), dim=c(n,n)) > > > > > > BA <- B%*%A > > > > > > start.time <- proc.time() > > > A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D > > > proc.time()-start.time > > [1] 4.75 0.03 5.13 NA NA > > > > > > start.time <- proc.time() > > > A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D)) > > > proc.time()-start.time > > [1] 4.19 0.01 4.49 NA NA > > A minor point on the methodology. You can do this in one step as > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > > Also, in R the second and subsequent timings tend to be a bit faster > than the first. I think this is due to heap storage being allocated > the first time that large chunks of memory are used and not needing to > be allocated for subsequent uses. I don't think so. As I understand it, large objects are allocated separately (not really out the the heap as it used to be), and the storage is not reused by R (but it may well be by the malloc system.). > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.78 0.09 0.87 0.00 0.00 > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.71 0.05 0.76 0.00 0.00 > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.79 0.08 0.87 0.00 0.00 > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.72 0.04 0.76 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.52 0.07 0.59 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.53 0.06 0.59 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.56 0.03 0.59 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.54 0.05 0.59 0.00 0.00 > > > > all.equal(A1, A2) > > [1] TRUE > > > > This was in R 1.8.1 under Windows 2000 on an IBM Thinkpad T30 > > with a Mobile Intel Pentium 4-M, 1.8Ghz, 1Gbyte RAM. The same > > script under S-Plus 6.2 produced the following elapsed times: > > [1] 3.325 0.121 3.815 0.000 0.000 > > This is using R-devel (to be 1.9.0) on a 2.0 GHz Pentium-4 desktop > computer running Linux and with Goto's BLAS. I'm not sure exactly > which of the changes from your system are resulting in the much faster > execution time but it is definitely not all due to the processor speed. > My guess is that most of the gain is due to the optimized BLAS. > Goto's BLAS are a big win on a Pentium-4 under Linux. (Thanks to > Brian Ripley for modifying the configure script for R to accept > --with-blas=-lgoto .) > > Corresponding timings on a Athlon XP 2500+ (1.83 GHz) running Linux > with Atlas are > > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 1.29 0.04 1.34 0.00 0.00 > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.88 0.06 0.95 0.00 0.00 > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.79 0.05 0.85 0.00 0.00 > > system.time(A1 <- A%*%solve(t(BA)%*%BA+C.)%*%BA%*%D) > [1] 0.82 0.04 0.87 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.61 0.06 0.67 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.66 0.02 0.69 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.51 0.10 0.61 0.00 0.00 > > system.time(A2 <- A%*%solve(crossprod(BA)+C., crossprod(t(BA), D))) > [1] 0.59 0.10 0.71 0.00 0.00 > > There you can see the faster execution of the second and subsequent > timings. > > I completely agree with you that using crossprod and the non-inverse > form of solve, where appropriate, helps. However, one of the best > optimizations for numerical linear algebra calculations is the use of > optimized BLAS. (I will avoid going in to the Linux vs Windows > comparisons :-) > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html