When introduced to R, I learned how to use *apply whenever I could to avoid for-loops and all. And, getting the habit, I think I somehow got the mis-conception that it is a magic source, always an optimal way of coding in R.
Thanks a lot for all of your helpful advice and comment! Young On Wed, Jan 5, 2011 at 3:09 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote: > > On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >>> >>> On Jan 5, 2011, at 10:03 AM, Young Cho wrote: >>> >>> Hi, >>>> >>>> I am doing some simulations and found a bottle neck in my R script. I >>>> made >>>> an example: >>>> >>>> a = matrix(rnorm(5000000),1000000,5) >>>>> tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt >>>>> >>>> >>>> [1] -1291.026 >>>> Time difference of 0.2354031 secs >>>> >>>>> >>>>> tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt >>>>> >>>> >>>> [1] -1291.026 >>>> Time difference of 20.23150 secs >>>> >>>> Is there a faster way of calculating sum of products (of columns, or of >>>> rows)? >>>> >>> >>> You should look at crossprod and tcrossprod. >>> >> >> Hmm. Not sure that would help, David. You could use a matrix >> multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but >> of course you could also use rowSums to get those. >> > > Thanks for pointing that out. I misread the OP's code. > > >> And is this an expected behavior? >>>> >>> >>> Yes. For loops and *apply strategies are slower than the proper use of >>> vectorized functions. >>> >> >> To expand a bit on David's point, the apply function isn't magic. It >> essentially loops over the rows, in this case. By multiplying columns >> together you are performing the looping over the rows in compiled >> code, which is much, much faster. If you want to do this kind of >> operation effectively in R for a general matrix (i.e. not knowing in >> advance that it has exactly 5 columns) you could use Reduce >> >> a <- matrix(rnorm(5000000),1000000,5) >>> system.time(pr1 <- a[,1]*a[,2]*a[,3]*a[,4]*a[,5]) >>> >> user system elapsed >> 0.15 0.09 0.37 >> >>> system.time(pr2 <- apply(a, 1, prod)) >>> >> user system elapsed >> 22.090 0.140 22.902 >> >>> all.equal(pr1, pr2) >>> >> [1] TRUE >> >>> system.time(pr3 <- Reduce(get("*"), as.data.frame(a), rep(1, nrow(a)))) >>> >> > Slightly faster would be: > > system.time(pr3 <- Reduce("*", as.data.frame(a))) > > And thanks for the nice example. Using a data.frame to feed Reduce > materially enhances its value to me. > > > user system elapsed >> 0.410 0.010 0.575 >> >>> all.equal(pr3, pr2) >>> >> [1] TRUE >> > > -- > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.