Re: [Rd] the incredible lightness of crossprod
On Fri, 28 Jan 2005, Uwe Ligges wrote: Patrick Burns wrote: On my machine the versions are all precompiled R, and I would be very surprised if the same were not the case on the client's machine. That is, no specially compiled BLAS. Hmmm. I always install using some advanced BLAS: On Windows, e.g., simply using the Rblas.dll provided by Brian Ripley, on Linux it's really no effort to compile it yourself. For huge matrices you can use your some-years-old-desktop machine (with some advanced BLAS) to outperform expensive multi-CPU machines (without advanced BLAS). As I think has been noted earlier, there is often some loss for moderately-sized matrices. When Doug Bates first used dgemm for %*% and crossprod, it made the R tests run noticeably slower on the (70MHz) Solaris machines I was using at the time. But that price is paid whether you have an optimized BLAS or not. But anyone working with 100+ diml matrices should try an optimized BLAS. Building ATLAS can be a bit painful (especially as you really need a shared library for use with R), but often prebuilt versions are available (e.g. Goto's BLAS and atlas on Debian and Windows). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
Patrick Burns wrote: On my machine the versions are all precompiled R, and I would be very surprised if the same were not the case on the client's machine. That is, no specially compiled BLAS. Hmmm. I always install using some advanced BLAS: On Windows, e.g., simply using the Rblas.dll provided by Brian Ripley, on Linux it's really no effort to compile it yourself. For huge matrices you can use your some-years-old-desktop machine (with some advanced BLAS) to outperform expensive multi-CPU machines (without advanced BLAS). Uwe Ligges __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
On my machine the versions are all precompiled R, and I would be very surprised if the same were not the case on the client's machine. That is, no specially compiled BLAS. Douglas Bates wrote: Patrick Burns wrote: The following is at least as much out of intellectual curiosity as for practical reasons. On reviewing some code written by novices to R, I came across: crossprod(x, y)[1,1] I thought, "That isn't a very S way of saying that, I wonder what the penalty is for using 'crossprod'." To my surprise the penalty was substantially negative. Handily the client had S-PLUS as well -- there the sign of the penalty was as I had expected, but the order of magnitude was off. Here are the timings of 1 million computations on vectors of length 1000. This is under Windows, R version 1.9.1 and S-PLUS 6.2 (on the same machine). Command RS-PLUS sum(x * y) 28.61 97.6 crossprod(x, y)[1,1] 6.77 2256.2 Another example is when computing the sums of the columns of a matrix. For example: set.seed(1) jjm <- matrix(rnorm(600), 5) Timings for this under Windows 2000 with R version 2.0.1 (on an old chip running at about 0.7Ghz) for 100,000 computations are: apply(jjm, 2, sum) 536.59 colSums(jjm) 18.26 rep(1,5) %*% jjm 15.41 crossprod(rep(1,5), jjm)13.16 (These timings seem to be stable across R versions and on at least one Linux platform.) Andy Liaw showed another example of 'crossprod' being fast a couple days ago on R-help. Questions for those with a more global picture of the code: * Is the speed advantage of 'crossprod' inherent, or is it because more care has been taken with its implementation than the other functions? * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is going to BLAS while 'sum' can't? For a numeric matrix crossprod ends up calling level 3 BLAS; either dsyrk for the single argument case or dgemm for the two argument case. Especially in accelerated versions of the BLAS like Atlas or Goto's BLAS, those routines are hideously efficient and that's where you are seeing the big gain in speed. By the way, you didn't mention if you had an accelerated BLAS installed with R. Do you? __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
Prof Brian Ripley wrote: On Thu, 27 Jan 2005, Paul Gilbert wrote: A few weeks ago I noticed z <- matrix(rnorm(2),1,2) system.time(for (i in 1:1000) apply(z,2,sum)) [1] 13.44 0.48 14.08 0.00 0.00 system.time(for (i in 1:1000) rep(1,1) %*% z) [1] 6.46 0.11 6.84 0.00 0.00 So both run in a few milliseconds for rather large problems. which seemed completely contrary to all my childhood teachings. Now system.time(for (i in 1:1000) crossprod(rep(1,1), z)) [1] 1.90 0.12 2.24 0.00 0.00 makes sense because it is suppose to be faster than %*% , but why is apply so slow? `so slow' sic: what are you going to do in the 7ms you saved? Yes, I think I've spent more time checking this than I will ever save. (And should I go back and change apply in my code everywhere or is this likely to reverse again?) It's not likely. apply is an R-level loop, and %*% is a C-level one. However, %*% is not supposed to be much slower than crossprod, and the devil is in the details of how the BLAS is implemented: the code is very similar. That %*% was faster than apply has been true in all my (17 years) of S/R experience. Your childhood may predate S3, of course. I didn't say the teachings were correct. I clearly should have questioned some things sooner. I still think one should use row/colSums for clarity with acceptable efficiency. It must be very unusual for these operations to be a dominant part of a calculation, so let's not lose proportion here. Agreed. I think I like the clarity reason best. Thanks for the explanation. Paul __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
Paul Gilbert wrote: A few weeks ago I noticed > z <- matrix(rnorm(2),1,2) > system.time(for (i in 1:1000) apply(z,2,sum)) [1] 13.44 0.48 14.08 0.00 0.00 > system.time(for (i in 1:1000) rep(1,1) %*% z) [1] 6.46 0.11 6.84 0.00 0.00 which seemed completely contrary to all my childhood teachings. Now Must have had an interesting childhood if you spent it learning about the speeds of various matrix multiplication techniques. > system.time(for (i in 1:1000) crossprod(rep(1,1), z)) [1] 1.90 0.12 2.24 0.00 0.00 and there is a good chance that a significant portion of that time is taken up by repeating the rep(1, 1) function call 1000 times. makes sense because it is suppose to be faster than %*% , but why is apply so slow? I believe that this is Bert Gunter's cue to comment on the internal (or infernal) nature of the apply functions. (And should I go back and change apply in my code everywhere or is this likely to reverse again?) Paul Gilbert __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
On Thu, 27 Jan 2005, Paul Gilbert wrote: A few weeks ago I noticed z <- matrix(rnorm(2),1,2) system.time(for (i in 1:1000) apply(z,2,sum)) [1] 13.44 0.48 14.08 0.00 0.00 system.time(for (i in 1:1000) rep(1,1) %*% z) [1] 6.46 0.11 6.84 0.00 0.00 So both run in a few milliseconds for rather large problems. which seemed completely contrary to all my childhood teachings. Now system.time(for (i in 1:1000) crossprod(rep(1,1), z)) [1] 1.90 0.12 2.24 0.00 0.00 makes sense because it is suppose to be faster than %*% , but why is apply so slow? `so slow' sic: what are you going to do in the 7ms you saved? (And should I go back and change apply in my code everywhere or is this likely to reverse again?) It's not likely. apply is an R-level loop, and %*% is a C-level one. However, %*% is not supposed to be much slower than crossprod, and the devil is in the details of how the BLAS is implemented: the code is very similar. That %*% was faster than apply has been true in all my (17 years) of S/R experience. Your childhood may predate S3, of course. I still think one should use row/colSums for clarity with acceptable efficiency. It must be very unusual for these operations to be a dominant part of a calculation, so let's not lose proportion here. Paul Gilbert Patrick Burns wrote: The following is at least as much out of intellectual curiosity as for practical reasons. On reviewing some code written by novices to R, I came across: crossprod(x, y)[1,1] I thought, "That isn't a very S way of saying that, I wonder what the penalty is for using 'crossprod'." To my surprise the penalty was substantially negative. Handily the client had S-PLUS as well -- there the sign of the penalty was as I had expected, but the order of magnitude was off. Here are the timings of 1 million computations on vectors of length 1000. This is under Windows, R version 1.9.1 and S-PLUS 6.2 (on the same machine). Command RS-PLUS sum(x * y) 28.6197.6 crossprod(x, y)[1,1] 6.77 2256.2 Another example is when computing the sums of the columns of a matrix. For example: set.seed(1) jjm <- matrix(rnorm(600), 5) Timings for this under Windows 2000 with R version 2.0.1 (on an old chip running at about 0.7Ghz) for 100,000 computations are: apply(jjm, 2, sum) 536.59 colSums(jjm) 18.26 rep(1,5) %*% jjm 15.41 crossprod(rep(1,5), jjm)13.16 (These timings seem to be stable across R versions and on at least one Linux platform.) Andy Liaw showed another example of 'crossprod' being fast a couple days ago on R-help. Questions for those with a more global picture of the code: * Is the speed advantage of 'crossprod' inherent, or is it because more care has been taken with its implementation than the other functions? * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is going to BLAS while 'sum' can't? * Would it make sense to (essentially) use 'crossprod' in 'colSums' and its friends at least for the special case of matrices? Patrick Burns Burns Statistics [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
A few weeks ago I noticed > z <- matrix(rnorm(2),1,2) > system.time(for (i in 1:1000) apply(z,2,sum)) [1] 13.44 0.48 14.08 0.00 0.00 > system.time(for (i in 1:1000) rep(1,1) %*% z) [1] 6.46 0.11 6.84 0.00 0.00 which seemed completely contrary to all my childhood teachings. Now > system.time(for (i in 1:1000) crossprod(rep(1,1), z)) [1] 1.90 0.12 2.24 0.00 0.00 makes sense because it is suppose to be faster than %*% , but why is apply so slow? (And should I go back and change apply in my code everywhere or is this likely to reverse again?) Paul Gilbert Patrick Burns wrote: The following is at least as much out of intellectual curiosity as for practical reasons. On reviewing some code written by novices to R, I came across: crossprod(x, y)[1,1] I thought, "That isn't a very S way of saying that, I wonder what the penalty is for using 'crossprod'." To my surprise the penalty was substantially negative. Handily the client had S-PLUS as well -- there the sign of the penalty was as I had expected, but the order of magnitude was off. Here are the timings of 1 million computations on vectors of length 1000. This is under Windows, R version 1.9.1 and S-PLUS 6.2 (on the same machine). Command RS-PLUS sum(x * y) 28.6197.6 crossprod(x, y)[1,1] 6.77 2256.2 Another example is when computing the sums of the columns of a matrix. For example: set.seed(1) jjm <- matrix(rnorm(600), 5) Timings for this under Windows 2000 with R version 2.0.1 (on an old chip running at about 0.7Ghz) for 100,000 computations are: apply(jjm, 2, sum) 536.59 colSums(jjm) 18.26 rep(1,5) %*% jjm 15.41 crossprod(rep(1,5), jjm)13.16 (These timings seem to be stable across R versions and on at least one Linux platform.) Andy Liaw showed another example of 'crossprod' being fast a couple days ago on R-help. Questions for those with a more global picture of the code: * Is the speed advantage of 'crossprod' inherent, or is it because more care has been taken with its implementation than the other functions? * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is going to BLAS while 'sum' can't? * Would it make sense to (essentially) use 'crossprod' in 'colSums' and its friends at least for the special case of matrices? Patrick Burns Burns Statistics [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
RE: [Rd] the incredible lightness of crossprod
> From: Douglas Bates > > Patrick Burns wrote: > > The following is at least as much out of intellectual curiosity > > as for practical reasons. > > On reviewing some code written by novices to R, I came > > across: > > > > crossprod(x, y)[1,1] > > > > I thought, "That isn't a very S way of saying that, I wonder > > what the penalty is for using 'crossprod'." To my surprise the > > penalty was substantially negative. Handily the client had S-PLUS > > as well -- there the sign of the penalty was as I had expected, but > > the order of magnitude was off. > > > > Here are the timings of 1 million computations on vectors of > > length 1000. This is under Windows, R version 1.9.1 and S-PLUS > > 6.2 (on the same machine). > > > > Command R >S-PLUS > > sum(x * y) 28.61 > 97.6 > > crossprod(x, y)[1,1] 6.77 2256.2 > > > > > > Another example is when computing the sums of the columns of a > > matrix. For example: > > > > set.seed(1) > > jjm <- matrix(rnorm(600), 5) > > > > Timings for this under Windows 2000 with R version 2.0.1 (on an > > old chip running at about 0.7Ghz) for 100,000 computations are: > > > > apply(jjm, 2, sum) 536.59 > > colSums(jjm) 18.26 > > rep(1,5) %*% jjm 15.41 > > crossprod(rep(1,5), jjm)13.16 > > > > (These timings seem to be stable across R versions and on at least > > one Linux platform.) > > > > Andy Liaw showed another example of 'crossprod' being fast a couple > > days ago on R-help. > > > > Questions for those with a more global picture of the code: > > > > * Is the speed advantage of 'crossprod' inherent, or is it because > > more care has been taken with its implementation than the other > > functions? > > > > * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is > > going to BLAS while 'sum' can't? > > For a numeric matrix crossprod ends up calling level 3 BLAS; either > dsyrk for the single argument case or dgemm for the two > argument case. > Especially in accelerated versions of the BLAS like Atlas or Goto's > BLAS, those routines are hideously efficient and that's where you are > seeing the big gain in speed. > > By the way, you didn't mention if you had an accelerated BLAS > installed > with R. Do you? For the case of crossprod(x, w) for computing weighted mean of x (the posting that Pat referred to), I tried the ATLAS-linked Rblas.dll on CRAN, and it's actually slower than the stock Rblas.dll. I believe Duncan M. had observed similar things for small-ish problems. (I used a Pentium M laptop, and tried both the P3 and P4 versions.) So, I think my main point is that even with non-optimized BLAS crossprod can be way faster. Andy > __ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] the incredible lightness of crossprod
Patrick Burns wrote: The following is at least as much out of intellectual curiosity as for practical reasons. On reviewing some code written by novices to R, I came across: crossprod(x, y)[1,1] I thought, "That isn't a very S way of saying that, I wonder what the penalty is for using 'crossprod'." To my surprise the penalty was substantially negative. Handily the client had S-PLUS as well -- there the sign of the penalty was as I had expected, but the order of magnitude was off. Here are the timings of 1 million computations on vectors of length 1000. This is under Windows, R version 1.9.1 and S-PLUS 6.2 (on the same machine). Command RS-PLUS sum(x * y) 28.6197.6 crossprod(x, y)[1,1] 6.77 2256.2 Another example is when computing the sums of the columns of a matrix. For example: set.seed(1) jjm <- matrix(rnorm(600), 5) Timings for this under Windows 2000 with R version 2.0.1 (on an old chip running at about 0.7Ghz) for 100,000 computations are: apply(jjm, 2, sum) 536.59 colSums(jjm) 18.26 rep(1,5) %*% jjm 15.41 crossprod(rep(1,5), jjm)13.16 (These timings seem to be stable across R versions and on at least one Linux platform.) Andy Liaw showed another example of 'crossprod' being fast a couple days ago on R-help. Questions for those with a more global picture of the code: * Is the speed advantage of 'crossprod' inherent, or is it because more care has been taken with its implementation than the other functions? * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is going to BLAS while 'sum' can't? For a numeric matrix crossprod ends up calling level 3 BLAS; either dsyrk for the single argument case or dgemm for the two argument case. Especially in accelerated versions of the BLAS like Atlas or Goto's BLAS, those routines are hideously efficient and that's where you are seeing the big gain in speed. By the way, you didn't mention if you had an accelerated BLAS installed with R. Do you? __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] the incredible lightness of crossprod
The following is at least as much out of intellectual curiosity as for practical reasons. On reviewing some code written by novices to R, I came across: crossprod(x, y)[1,1] I thought, "That isn't a very S way of saying that, I wonder what the penalty is for using 'crossprod'." To my surprise the penalty was substantially negative. Handily the client had S-PLUS as well -- there the sign of the penalty was as I had expected, but the order of magnitude was off. Here are the timings of 1 million computations on vectors of length 1000. This is under Windows, R version 1.9.1 and S-PLUS 6.2 (on the same machine). Command RS-PLUS sum(x * y) 28.6197.6 crossprod(x, y)[1,1] 6.77 2256.2 Another example is when computing the sums of the columns of a matrix. For example: set.seed(1) jjm <- matrix(rnorm(600), 5) Timings for this under Windows 2000 with R version 2.0.1 (on an old chip running at about 0.7Ghz) for 100,000 computations are: apply(jjm, 2, sum) 536.59 colSums(jjm) 18.26 rep(1,5) %*% jjm 15.41 crossprod(rep(1,5), jjm)13.16 (These timings seem to be stable across R versions and on at least one Linux platform.) Andy Liaw showed another example of 'crossprod' being fast a couple days ago on R-help. Questions for those with a more global picture of the code: * Is the speed advantage of 'crossprod' inherent, or is it because more care has been taken with its implementation than the other functions? * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is going to BLAS while 'sum' can't? * Would it make sense to (essentially) use 'crossprod' in 'colSums' and its friends at least for the special case of matrices? Patrick Burns Burns Statistics [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel