When introduced to R, I learned how to use *apply whenever I could to avoid
for-loops and all. And, getting the habit, I think I somehow got the
mis-conception that it is a magic source, always an optimal way of coding in
R.

Thanks a lot for all of your helpful advice and comment!

Young

On Wed, Jan 5, 2011 at 3:09 PM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote:
>
>  On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius <dwinsem...@comcast.net>
>> wrote:
>>
>>>
>>> On Jan 5, 2011, at 10:03 AM, Young Cho wrote:
>>>
>>>  Hi,
>>>>
>>>> I am doing some simulations and found a bottle neck in my R script. I
>>>> made
>>>> an example:
>>>>
>>>>  a = matrix(rnorm(5000000),1000000,5)
>>>>> tt  = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt
>>>>>
>>>>
>>>> [1] -1291.026
>>>> Time difference of 0.2354031 secs
>>>>
>>>>>
>>>>> tt  = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt
>>>>>
>>>>
>>>> [1] -1291.026
>>>> Time difference of 20.23150 secs
>>>>
>>>> Is there a faster way of calculating sum of products (of columns, or of
>>>> rows)?
>>>>
>>>
>>> You should look at crossprod and tcrossprod.
>>>
>>
>> Hmm.  Not sure that would help, David.  You could use a matrix
>> multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but
>> of course you could also use rowSums to get those.
>>
>
> Thanks for pointing  that out. I misread the OP's code.
>
>
>>  And is this an expected behavior?
>>>>
>>>
>>> Yes. For loops and *apply strategies are slower than the proper use of
>>> vectorized functions.
>>>
>>
>> To expand a bit on David's point, the apply function isn't magic.  It
>> essentially loops over the rows, in this case.  By multiplying columns
>> together you are performing the looping over the rows in compiled
>> code, which is much, much faster.  If you want to do this kind of
>> operation effectively in R for a general matrix (i.e. not knowing in
>> advance that it has exactly 5 columns) you could use Reduce
>>
>>  a <- matrix(rnorm(5000000),1000000,5)
>>> system.time(pr1 <- a[,1]*a[,2]*a[,3]*a[,4]*a[,5])
>>>
>>  user  system elapsed
>>  0.15    0.09    0.37
>>
>>> system.time(pr2 <- apply(a, 1, prod))
>>>
>>  user  system elapsed
>> 22.090   0.140  22.902
>>
>>> all.equal(pr1, pr2)
>>>
>> [1] TRUE
>>
>>> system.time(pr3 <- Reduce(get("*"), as.data.frame(a), rep(1, nrow(a))))
>>>
>>
> Slightly faster would be:
>
> system.time(pr3 <- Reduce("*", as.data.frame(a)))
>
> And thanks for the nice example. Using a data.frame to feed Reduce
> materially enhances its value to me.
>
>
>   user  system elapsed
>>  0.410   0.010   0.575
>>
>>> all.equal(pr3, pr2)
>>>
>> [1] TRUE
>>
>
> --
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to