Re: [Rd] the incredible lightness of crossprod

2005-01-28 Thread Prof Brian Ripley
On Fri, 28 Jan 2005, Uwe Ligges wrote:
Patrick Burns wrote:
On my machine the versions are all precompiled R, and I would
be very surprised if the same were not the case on the client's
machine.  That is, no specially compiled BLAS.
Hmmm. I always install using some advanced BLAS: On Windows, e.g., simply 
using the Rblas.dll provided by Brian Ripley, on Linux it's really no effort 
to compile it yourself.

For huge matrices you can use your some-years-old-desktop machine (with some 
advanced BLAS) to outperform expensive multi-CPU machines (without advanced 
BLAS).
As I think has been noted earlier, there is often some loss for 
moderately-sized matrices.  When Doug Bates first used dgemm for %*% and 
crossprod, it made the R tests run noticeably slower on the (70MHz)
Solaris machines I was using at the time.  But that price is paid whether 
you have an optimized BLAS or not.

But anyone working with 100+ diml matrices should try an optimized BLAS. 
Building ATLAS can be a bit painful (especially as you really need a 
shared library for use with R), but often prebuilt versions are available 
(e.g. Goto's BLAS and atlas on Debian and Windows).

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-28 Thread Uwe Ligges
Patrick Burns wrote:
On my machine the versions are all precompiled R, and I would
be very surprised if the same were not the case on the client's
machine.  That is, no specially compiled BLAS.
Hmmm. I always install using some advanced BLAS: On Windows, e.g., 
simply using the Rblas.dll provided by Brian Ripley, on Linux it's 
really no effort to compile it yourself.

For huge matrices you can use your some-years-old-desktop machine (with 
some advanced BLAS) to outperform expensive multi-CPU machines (without 
advanced BLAS).

Uwe Ligges
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Patrick Burns
On my machine the versions are all precompiled R, and I would
be very surprised if the same were not the case on the client's
machine.  That is, no specially compiled BLAS.
Douglas Bates wrote:
Patrick Burns wrote:
The following is at least as much out of intellectual curiosity
as for practical reasons.
On reviewing some code written by novices to R, I came
across:
crossprod(x, y)[1,1]
I  thought, "That isn't a very S way of saying that,  I wonder
what the penalty is for using 'crossprod'."  To my surprise the
penalty was substantially negative.  Handily the client had S-PLUS
as well -- there the sign of the penalty was as I had expected, but
the order of magnitude was off.
Here are the timings of 1 million computations on vectors of
length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
6.2 (on the same machine).
Command   RS-PLUS
sum(x * y)  28.61
97.6
crossprod(x, y)[1,1] 6.77 2256.2

Another example is when computing the sums of the columns of a
matrix.  For example:
set.seed(1)
jjm <- matrix(rnorm(600), 5)
Timings for this under Windows 2000 with R version 2.0.1 (on an
old chip running at about 0.7Ghz) for 100,000 computations are:
apply(jjm, 2, sum)   536.59
colSums(jjm) 18.26
rep(1,5) %*% jjm 15.41
crossprod(rep(1,5), jjm)13.16
(These timings seem to be stable across R versions and on at least
one Linux platform.)
Andy Liaw showed another example of 'crossprod' being fast a couple
days ago on R-help.
Questions for those with a more global picture of the code:
*  Is the speed advantage of 'crossprod' inherent, or is it because
more care has been taken with its implementation than the other
functions?
*  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
going to BLAS while 'sum' can't?

For a numeric matrix crossprod ends up calling level 3 BLAS; either 
dsyrk for the single argument case or dgemm for the two argument case. 
Especially in accelerated versions of the BLAS like Atlas or Goto's 
BLAS, those routines are hideously efficient and that's where you are 
seeing the big gain in speed.

By the way, you didn't mention if you had an accelerated BLAS 
installed with R.  Do you?


__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Paul Gilbert

Prof Brian Ripley wrote:
On Thu, 27 Jan 2005, Paul Gilbert wrote:
A few weeks ago I noticed
 z <- matrix(rnorm(2),1,2)

system.time(for (i in 1:1000) apply(z,2,sum))
[1] 13.44  0.48 14.08  0.00  0.00
system.time(for (i in 1:1000) rep(1,1) %*% z)
[1] 6.46 0.11 6.84 0.00 0.00

So both run in a few milliseconds for rather large problems.
which seemed  completely contrary to all my childhood teachings. Now
system.time(for (i in 1:1000) crossprod(rep(1,1), z))
[1] 1.90 0.12 2.24 0.00 0.00
makes sense because it is suppose to be faster than %*% , but why is 
apply so slow?

`so slow' sic: what are you going to do in the 7ms you saved? 
Yes, I think I've spent more  time checking this than I will ever  save.
(And should I go back and change apply in my code everywhere or is 
this likely to reverse again?)

It's not likely.  apply is an R-level loop, and  %*% is a C-level one.
However,  %*% is not supposed to be much slower than crossprod, and 
the devil is in the details of how the BLAS is implemented: the code 
is very similar.

That %*% was faster than apply has been true in all my (17 years) of 
S/R experience.  Your childhood may predate S3, of course.
I didn't say the teachings were correct. I clearly should have 
questioned some things sooner.

I still think one should use row/colSums for clarity with acceptable
efficiency.  It must be very unusual for these operations to be a 
dominant part of a calculation, so let's not lose proportion here. 
Agreed. I think I like the clarity reason best.
Thanks for the explanation.
Paul
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Douglas Bates
Paul Gilbert wrote:
A few weeks ago I noticed
 >  z <- matrix(rnorm(2),1,2)
 > system.time(for (i in 1:1000) apply(z,2,sum))
[1] 13.44  0.48 14.08  0.00  0.00
 > system.time(for (i in 1:1000) rep(1,1) %*% z)
[1] 6.46 0.11 6.84 0.00 0.00
which seemed  completely contrary to all my childhood teachings. Now
Must have had an interesting childhood if you spent it learning about 
the speeds of various matrix multiplication techniques.
	
 > system.time(for (i in 1:1000) crossprod(rep(1,1), z))
[1] 1.90 0.12 2.24 0.00 0.00
and there is a good chance that a significant portion of that time is 
taken up by repeating the rep(1, 1) function call 1000 times.

makes sense because it is suppose to be faster than %*% , but why is 
apply so slow?
I believe that this is Bert Gunter's cue to comment on the internal (or 
infernal) nature of the apply functions.

(And should I go back and change apply in my code everywhere or is this 
likely to reverse again?)

Paul Gilbert
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Prof Brian Ripley
On Thu, 27 Jan 2005, Paul Gilbert wrote:
A few weeks ago I noticed
 z <- matrix(rnorm(2),1,2)

system.time(for (i in 1:1000) apply(z,2,sum))
[1] 13.44  0.48 14.08  0.00  0.00
system.time(for (i in 1:1000) rep(1,1) %*% z)
[1] 6.46 0.11 6.84 0.00 0.00
So both run in a few milliseconds for rather large problems.
which seemed  completely contrary to all my childhood teachings. Now
system.time(for (i in 1:1000) crossprod(rep(1,1), z))
[1] 1.90 0.12 2.24 0.00 0.00
makes sense because it is suppose to be faster than %*% , but why is apply so 
slow?
`so slow' sic: what are you going to do in the 7ms you saved?
(And should I go back and change apply in my code everywhere or is this 
likely to reverse again?)
It's not likely.  apply is an R-level loop, and  %*% is a C-level one.
However,  %*% is not supposed to be much slower than crossprod, and the 
devil is in the details of how the BLAS is implemented: the code is very 
similar.

That %*% was faster than apply has been true in all my (17 years) of S/R 
experience.  Your childhood may predate S3, of course.

I still think one should use row/colSums for clarity with acceptable
efficiency.  It must be very unusual for these operations to be a dominant 
part of a calculation, so let's not lose proportion here.


Paul Gilbert
Patrick Burns wrote:
The following is at least as much out of intellectual curiosity
as for practical reasons.
On reviewing some code written by novices to R, I came
across:
crossprod(x, y)[1,1]
I  thought, "That isn't a very S way of saying that,  I wonder
what the penalty is for using 'crossprod'."  To my surprise the
penalty was substantially negative.  Handily the client had S-PLUS
as well -- there the sign of the penalty was as I had expected, but
the order of magnitude was off.
Here are the timings of 1 million computations on vectors of
length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
6.2 (on the same machine).
Command   RS-PLUS
sum(x * y)  28.6197.6
crossprod(x, y)[1,1] 6.77 2256.2
Another example is when computing the sums of the columns of a
matrix.  For example:
set.seed(1)
jjm <- matrix(rnorm(600), 5)
Timings for this under Windows 2000 with R version 2.0.1 (on an
old chip running at about 0.7Ghz) for 100,000 computations are:
apply(jjm, 2, sum)   536.59
colSums(jjm) 18.26
rep(1,5) %*% jjm 15.41
crossprod(rep(1,5), jjm)13.16
(These timings seem to be stable across R versions and on at least
one Linux platform.)
Andy Liaw showed another example of 'crossprod' being fast a couple
days ago on R-help.
Questions for those with a more global picture of the code:
*  Is the speed advantage of 'crossprod' inherent, or is it because
more care has been taken with its implementation than the other
functions?
*  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
going to BLAS while 'sum' can't?
*  Would it make sense to (essentially) use 'crossprod' in
'colSums' and its friends at least for the special case of matrices?
Patrick Burns
Burns Statistics
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Paul Gilbert
A few weeks ago I noticed
>  z <- matrix(rnorm(2),1,2)
> system.time(for (i in 1:1000) apply(z,2,sum))
[1] 13.44  0.48 14.08  0.00  0.00
> system.time(for (i in 1:1000) rep(1,1) %*% z)
[1] 6.46 0.11 6.84 0.00 0.00
which seemed  completely contrary to all my childhood teachings. Now
> system.time(for (i in 1:1000) crossprod(rep(1,1), z))
[1] 1.90 0.12 2.24 0.00 0.00
makes sense because it is suppose to be faster than %*% , but why is 
apply so slow?
(And should I go back and change apply in my code everywhere or is this 
likely to reverse again?)

Paul Gilbert
Patrick Burns wrote:
The following is at least as much out of intellectual curiosity
as for practical reasons.
On reviewing some code written by novices to R, I came
across:
crossprod(x, y)[1,1]
I  thought, "That isn't a very S way of saying that,  I wonder
what the penalty is for using 'crossprod'."  To my surprise the
penalty was substantially negative.  Handily the client had S-PLUS
as well -- there the sign of the penalty was as I had expected, but
the order of magnitude was off.
Here are the timings of 1 million computations on vectors of
length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
6.2 (on the same machine).
Command   RS-PLUS
sum(x * y)  28.6197.6
crossprod(x, y)[1,1] 6.77 2256.2
Another example is when computing the sums of the columns of a
matrix.  For example:
set.seed(1)
jjm <- matrix(rnorm(600), 5)
Timings for this under Windows 2000 with R version 2.0.1 (on an
old chip running at about 0.7Ghz) for 100,000 computations are:
apply(jjm, 2, sum)   536.59
colSums(jjm) 18.26
rep(1,5) %*% jjm 15.41
crossprod(rep(1,5), jjm)13.16
(These timings seem to be stable across R versions and on at least
one Linux platform.)
Andy Liaw showed another example of 'crossprod' being fast a couple
days ago on R-help.
Questions for those with a more global picture of the code:
*  Is the speed advantage of 'crossprod' inherent, or is it because
more care has been taken with its implementation than the other
functions?
*  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
going to BLAS while 'sum' can't?
*  Would it make sense to (essentially) use 'crossprod' in
'colSums' and its friends at least for the special case of matrices?
Patrick Burns
Burns Statistics
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


RE: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Liaw, Andy
> From: Douglas Bates
> 
> Patrick Burns wrote:
> > The following is at least as much out of intellectual curiosity
> > as for practical reasons.
> > On reviewing some code written by novices to R, I came
> > across:
> > 
> > crossprod(x, y)[1,1]
> > 
> > I  thought, "That isn't a very S way of saying that,  I wonder
> > what the penalty is for using 'crossprod'."  To my surprise the
> > penalty was substantially negative.  Handily the client had S-PLUS
> > as well -- there the sign of the penalty was as I had expected, but
> > the order of magnitude was off.
> > 
> > Here are the timings of 1 million computations on vectors of
> > length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
> > 6.2 (on the same machine).
> > 
> > Command   R 
>S-PLUS
> > sum(x * y)  28.61   
>  97.6
> > crossprod(x, y)[1,1] 6.77 2256.2
> > 
> > 
> > Another example is when computing the sums of the columns of a
> > matrix.  For example:
> > 
> > set.seed(1)
> > jjm <- matrix(rnorm(600), 5)
> > 
> > Timings for this under Windows 2000 with R version 2.0.1 (on an
> > old chip running at about 0.7Ghz) for 100,000 computations are:
> > 
> > apply(jjm, 2, sum)   536.59
> > colSums(jjm) 18.26
> > rep(1,5) %*% jjm 15.41
> > crossprod(rep(1,5), jjm)13.16
> > 
> > (These timings seem to be stable across R versions and on at least
> > one Linux platform.)
> > 
> > Andy Liaw showed another example of 'crossprod' being fast a couple
> > days ago on R-help.
> > 
> > Questions for those with a more global picture of the code:
> > 
> > *  Is the speed advantage of 'crossprod' inherent, or is it because
> > more care has been taken with its implementation than the other
> > functions?
> > 
> > *  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
> > going to BLAS while 'sum' can't?
> 
> For a numeric matrix crossprod ends up calling level 3 BLAS; either 
> dsyrk for the single argument case or dgemm for the two 
> argument case. 
> Especially in accelerated versions of the BLAS like Atlas or Goto's 
> BLAS, those routines are hideously efficient and that's where you are 
> seeing the big gain in speed.
> 
> By the way, you didn't mention if you had an accelerated BLAS 
> installed 
> with R.  Do you?

For the case of crossprod(x, w) for computing weighted mean of x (the
posting that Pat referred to), I tried the ATLAS-linked Rblas.dll on CRAN,
and it's actually slower than the stock Rblas.dll.  I believe Duncan M. had
observed similar things for small-ish problems.  (I used a Pentium M laptop,
and tried both the P3 and P4 versions.)

So, I think my main point is that even with non-optimized BLAS crossprod can
be way faster.

Andy
 
> __
> R-devel@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the incredible lightness of crossprod

2005-01-27 Thread Douglas Bates
Patrick Burns wrote:
The following is at least as much out of intellectual curiosity
as for practical reasons.
On reviewing some code written by novices to R, I came
across:
crossprod(x, y)[1,1]
I  thought, "That isn't a very S way of saying that,  I wonder
what the penalty is for using 'crossprod'."  To my surprise the
penalty was substantially negative.  Handily the client had S-PLUS
as well -- there the sign of the penalty was as I had expected, but
the order of magnitude was off.
Here are the timings of 1 million computations on vectors of
length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
6.2 (on the same machine).
Command   RS-PLUS
sum(x * y)  28.6197.6
crossprod(x, y)[1,1] 6.77 2256.2
Another example is when computing the sums of the columns of a
matrix.  For example:
set.seed(1)
jjm <- matrix(rnorm(600), 5)
Timings for this under Windows 2000 with R version 2.0.1 (on an
old chip running at about 0.7Ghz) for 100,000 computations are:
apply(jjm, 2, sum)   536.59
colSums(jjm) 18.26
rep(1,5) %*% jjm 15.41
crossprod(rep(1,5), jjm)13.16
(These timings seem to be stable across R versions and on at least
one Linux platform.)
Andy Liaw showed another example of 'crossprod' being fast a couple
days ago on R-help.
Questions for those with a more global picture of the code:
*  Is the speed advantage of 'crossprod' inherent, or is it because
more care has been taken with its implementation than the other
functions?
*  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
going to BLAS while 'sum' can't?
For a numeric matrix crossprod ends up calling level 3 BLAS; either 
dsyrk for the single argument case or dgemm for the two argument case. 
Especially in accelerated versions of the BLAS like Atlas or Goto's 
BLAS, those routines are hideously efficient and that's where you are 
seeing the big gain in speed.

By the way, you didn't mention if you had an accelerated BLAS installed 
with R.  Do you?

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] the incredible lightness of crossprod

2005-01-27 Thread Patrick Burns
The following is at least as much out of intellectual curiosity
as for practical reasons. 

On reviewing some code written by novices to R, I came
across:
crossprod(x, y)[1,1]
I  thought, "That isn't a very S way of saying that,  I wonder
what the penalty is for using 'crossprod'."  To my surprise the
penalty was substantially negative.  Handily the client had S-PLUS
as well -- there the sign of the penalty was as I had expected, but
the order of magnitude was off.
Here are the timings of 1 million computations on vectors of
length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
6.2 (on the same machine).
Command   RS-PLUS
sum(x * y)  28.6197.6
crossprod(x, y)[1,1] 6.77 2256.2
Another example is when computing the sums of the columns of a
matrix.  For example:
set.seed(1)
jjm <- matrix(rnorm(600), 5)
Timings for this under Windows 2000 with R version 2.0.1 (on an
old chip running at about 0.7Ghz) for 100,000 computations are:
apply(jjm, 2, sum)   536.59
colSums(jjm) 18.26
rep(1,5) %*% jjm 15.41
crossprod(rep(1,5), jjm)13.16
(These timings seem to be stable across R versions and on at least
one Linux platform.)
Andy Liaw showed another example of 'crossprod' being fast a couple
days ago on R-help.
Questions for those with a more global picture of the code:
*  Is the speed advantage of 'crossprod' inherent, or is it because
more care has been taken with its implementation than the other
functions?
*  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
going to BLAS while 'sum' can't?
*  Would it make sense to (essentially) use 'crossprod' in
'colSums' and its friends at least for the special case of matrices?
Patrick Burns
Burns Statistics
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel