Re: [R] matrix row product and cumulative product
Hi Jeff, If I understand correctly, the overhead of a loop is that at each iteration the command must be interpreted, and this time is independent of the number of rows N. So if N is small this overhead may be very significant but when N is large this should be very small compared to the time needed to multiply N pairs of numbers. --- On Mon, 18/8/08, Jeff Laake [EMAIL PROTECTED] wrote: From: Jeff Laake [EMAIL PROTECTED] Subject: [R] matrix row product and cumulative product To: r-help@r-project.org Received: Monday, 18 August, 2008, 12:49 PM I spent a lot of time searching and came up empty handed on the following query. Is there an equivalent to rowSums that does product or cumulative product and avoids use of apply or looping? I found a rowProd in a package but it was a convenience function for apply. As part of a likelihood calculation called from optim, I’m computing products and cumulative products of rows of matrices with far more rows than columns. I started with apply and after some thought realized that a loop of columns might be faster and it was substantially faster (see below). Because the likelihood function is called many times I’d like to speed it up even more if possible. Below is an example showing the cumprod.matrix and prod.matrix looping functions that I wrote and some timing comparisons to the use of apply for different column and row dimensions. At this point I’m better off with looping but I’d like to hear of any further suggestions. Thanks –jeff prod.matrix=function(x) + { + y=x[,1] + for(i in 2:dim(x)[2]) + y=y*x[,i] + return(y) + } cumprod.matrix=function(x) + { + y=matrix(1,nrow=dim(x)[1],ncol=dim(x)[2]) + y[,1]=x[,1] + for (i in 2:dim(x)[2]) + y[,i]=y[,i-1]*x[,i] + return(y) + } N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 1.07 0.09 1.15 system.time(t(apply(xmat,1,cumprod))) user system elapsed 29.27 0.21 29.50 system.time(prod.matrix(xmat)) user system elapsed 0.29 0.00 0.30 system.time(apply(xmat,1,prod)) user system elapsed 30.69 0.00 30.72 xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) user system elapsed 1.05 0.13 1.18 system.time(t(apply(xmat,1,cumprod))) user system elapsed 3.55 0.14 3.70 system.time(prod.matrix(xmat)) user system elapsed 0.38 0.01 0.39 system.time(apply(xmat,1,prod)) user system elapsed 2.87 0.00 2.89 xmat=matrix(runif(N),ncol=1000) system.time(cumprod.matrix(xmat)) user system elapsed 1.30 0.18 1.46 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.77 0.27 2.05 system.time(prod.matrix(xmat)) user system elapsed 0.46 0.00 0.47 system.time(apply(xmat,1,prod)) user system elapsed 0.7 0.0 0.7 xmat=matrix(runif(N),ncol=1) system.time(cumprod.matrix(xmat)) user system elapsed 1.28 0.00 1.29 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.19 0.08 1.26 system.time(prod.matrix(xmat)) user system elapsed 0.40 0.00 0.41 system.time(apply(xmat,1,prod)) user system elapsed 0.57 0.00 0.56 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 3.18 0.00 3.19 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.42 0.21 1.64 system.time(prod.matrix(xmat)) user system elapsed 1.05 0.00 1.05 system.time(apply(xmat,1,prod)) user system elapsed 0.82 0.00 0.81 R.Version() $platform [1] i386-pc-mingw32 . . . $version.string [1] R version 2.7.1 (2008-06-23) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix row product and cumulative product
Thanks for the tips on inline, jit and Reduce. The latter was exactly what I wanted although the loop is still the fastest for the simple product (accumulate=TRUE for reduce). With regards to Moshe's comment, I was just surprised by the timing difference. I tend to use apply without giving it much thought. After profiling the code it became apparent that a loop was better in this case. I was just surprised that a loop was still as good when the columns were 10 times the rows. I'm very intrigued by the inline package but couldn't find any documentation on the compiler I need with a Windows machine to make it work. Any hints would be very much appreciated especially in regards to FORTRAN which was my first language some 35 years ago. I have MS FORTRAN 90 although I've not touched it for over 6 years thanks to the developers of R. Thanks much for the help. --jeff Elapsed times from system.time. see code below Columns 10 100 10001 10 Rows100 10 1 1000100 cumprod Loop1.0 1.0 1.3 1.2 3.0 Apply 27.33.4 1.8 1.2 1.4 Reduce 0.5 0.7 0.7 0.9 3.2 prodLoop0.3 0.3 0.4 0.4 0.9 Apply 30.02.7 0.7 0.6 0.8 Reduce 0.6 0.6 0.8 1.0 4.7 N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=1000) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=1) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) Charles C. Berry wrote: On Sun, 17 Aug 2008, Jeff Laake wrote: I spent a lot of time searching and came up empty handed on the following query. Is there an equivalent to rowSums that does product or cumulative product and avoids use of apply or looping? I found a rowProd in a package but it was a convenience function for apply. As part of a likelihood calculation called from optim, I’m computing products and cumulative products of rows of matrices with far more rows than columns. I started with apply and after some thought realized that a loop of columns might be faster and it was substantially faster (see below). Because the likelihood function is called many times I’d like to speed it up even more if possible. You might check out the 'inline' or 'jit' packages. Otherwise, if you can as easily treat xmat as a list (or data.frame), Reduce( *, xmat.data.frame, accumulate=want.cumprod ) (where want.cumprod is FALSE for product, TRUE for cumulative product) will be a bit faster in many circumstances. However, this advantage is lost if you must retain xmat as a matrix since converting it to a data.frame seems to require more time than you save. HTH, Chuck Below is an example showing the cumprod.matrix and prod.matrix looping functions that I wrote and some timing comparisons to the use of apply for different column and row dimensions. At this point I’m better off with looping but I’d like to hear of any further suggestions. Thanks –jeff prod.matrix=function(x) + { + y=x[,1] + for(i in 2:dim(x)[2]) + y=y*x[,i] + return(y) + } cumprod.matrix=function(x) + { + y=matrix(1,nrow=dim(x)[1],ncol=dim(x)[2]) + y[,1]=x[,1] + for (i in 2:dim(x)[2]) + y[,i]=y[,i-1]*x[,i] + return(y) + } N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 1.07 0.09 1.15 system.time(t(apply(xmat,1,cumprod))) user system elapsed 29.27 0.21 29.50 system.time(prod.matrix(xmat)) user system elapsed 0.29 0.00 0.30 system.time(apply(xmat,1,prod)) user system elapsed 30.69 0.00 30.72 xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) user system
Re: [R] matrix row product and cumulative product
Sorry a correction to my last posting. I had accumulate switched between prod and cumprod and I had also forgotten to included time for conversion from list back to matrix for cumprod. Now as Chuck stated the results for Reduce are about the same or worse than a loop. regards--jeff Columns 10 100 1000 1 10 Rows 100 10 1 1000 100 cumprod Loop 1.0 1.1 1.3 1.2 3.0 Apply 26.4 3.4 1.8 1.3 1.4 Reduce 1.1 0.9 1.2 1.2 5.3 prod Loop 0.3 0.3 0.4 0.4 0.9 Apply 30.0 2.7 0.7 0.5 0.8 Reduce 0.70.6 0.7 0.8 3.1 N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(do.call(cbind,Reduce(*,as.data.frame(xmat),accumulate=TRUE))) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(do.call(cbind,Reduce(*,as.data.frame(xmat),accumulate=TRUE))) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) xmat=matrix(runif(N),ncol=1000) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(do.call(cbind,Reduce(*,as.data.frame(xmat),accumulate=TRUE))) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) xmat=matrix(runif(N),ncol=1) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(do.call(cbind,Reduce(*,as.data.frame(xmat),accumulate=TRUE))) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(do.call(cbind,Reduce(*,as.data.frame(xmat),accumulate=TRUE))) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) Jeff Laake wrote: Thanks for the tips on inline, jit and Reduce. The latter was exactly what I wanted although the loop is still the fastest for the simple product (accumulate=TRUE for reduce). With regards to Moshe's comment, I was just surprised by the timing difference. I tend to use apply without giving it much thought. After profiling the code it became apparent that a loop was better in this case. I was just surprised that a loop was still as good when the columns were 10 times the rows. I'm very intrigued by the inline package but couldn't find any documentation on the compiler I need with a Windows machine to make it work. Any hints would be very much appreciated especially in regards to FORTRAN which was my first language some 35 years ago. I have MS FORTRAN 90 although I've not touched it for over 6 years thanks to the developers of R. Thanks much for the help. --jeff Elapsed times from system.time. see code below Columns 10 100 1000 1 10 Rows 100 10 1 1000 100 cumprod Loop 1.0 1.0 1.3 1.2 3.0 Apply 27.3 3.4 1.8 1.2 1.4 Reduce 0.5 0.7 0.7 0.9 3.2 prod Loop 0.3 0.3 0.4 0.4 0.9 Apply 30.0 2.7 0.7 0.6 0.8 Reduce 0.6 0.6 0.8 1.0 4.7 N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=1000) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod)) system.time(Reduce(*,as.data.frame(xmat),accumulate=TRUE)) xmat=matrix(runif(N),ncol=1) system.time(cumprod.matrix(xmat)) system.time(t(apply(xmat,1,cumprod))) system.time(Reduce(*,as.data.frame(xmat),accumulate=FALSE)) system.time(prod.matrix(xmat)) system.time(apply(xmat,1,prod))
Re: [R] matrix row product and cumulative product
On Sun, 17 Aug 2008, Jeff Laake wrote: I spent a lot of time searching and came up empty handed on the following query. Is there an equivalent to rowSums that does product or cumulative product and avoids use of apply or looping? I found a rowProd in a package but it was a convenience function for apply. As part of a likelihood calculation called from optim, I’m computing products and cumulative products of rows of matrices with far more rows than columns. I started with apply and after some thought realized that a loop of columns might be faster and it was substantially faster (see below). Because the likelihood function is called many times I’d like to speed it up even more if possible. You might check out the 'inline' or 'jit' packages. Otherwise, if you can as easily treat xmat as a list (or data.frame), Reduce( *, xmat.data.frame, accumulate=want.cumprod ) (where want.cumprod is FALSE for product, TRUE for cumulative product) will be a bit faster in many circumstances. However, this advantage is lost if you must retain xmat as a matrix since converting it to a data.frame seems to require more time than you save. HTH, Chuck Below is an example showing the cumprod.matrix and prod.matrix looping functions that I wrote and some timing comparisons to the use of apply for different column and row dimensions. At this point I’m better off with looping but I’d like to hear of any further suggestions. Thanks –jeff prod.matrix=function(x) + { + y=x[,1] + for(i in 2:dim(x)[2]) + y=y*x[,i] + return(y) + } cumprod.matrix=function(x) + { + y=matrix(1,nrow=dim(x)[1],ncol=dim(x)[2]) + y[,1]=x[,1] + for (i in 2:dim(x)[2]) + y[,i]=y[,i-1]*x[,i] + return(y) + } N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 1.07 0.09 1.15 system.time(t(apply(xmat,1,cumprod))) user system elapsed 29.27 0.21 29.50 system.time(prod.matrix(xmat)) user system elapsed 0.29 0.00 0.30 system.time(apply(xmat,1,prod)) user system elapsed 30.69 0.00 30.72 xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) user system elapsed 1.05 0.13 1.18 system.time(t(apply(xmat,1,cumprod))) user system elapsed 3.55 0.14 3.70 system.time(prod.matrix(xmat)) user system elapsed 0.38 0.01 0.39 system.time(apply(xmat,1,prod)) user system elapsed 2.87 0.00 2.89 xmat=matrix(runif(N),ncol=1000) system.time(cumprod.matrix(xmat)) user system elapsed 1.30 0.18 1.46 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.77 0.27 2.05 system.time(prod.matrix(xmat)) user system elapsed 0.46 0.00 0.47 system.time(apply(xmat,1,prod)) user system elapsed 0.7 0.0 0.7 xmat=matrix(runif(N),ncol=1) system.time(cumprod.matrix(xmat)) user system elapsed 1.28 0.00 1.29 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.19 0.08 1.26 system.time(prod.matrix(xmat)) user system elapsed 0.40 0.00 0.41 system.time(apply(xmat,1,prod)) user system elapsed 0.57 0.00 0.56 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 3.18 0.00 3.19 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.42 0.21 1.64 system.time(prod.matrix(xmat)) user system elapsed 1.05 0.00 1.05 system.time(apply(xmat,1,prod)) user system elapsed 0.82 0.00 0.81 R.Version() $platform [1] i386-pc-mingw32 . . . $version.string [1] R version 2.7.1 (2008-06-23) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.