I feel foolish now. I forgot to note that I build R using a Win64 version of OpenBLAS (see <http://www.avrahamadler.com/2014/04/20/r-3-1-0-openblas-speed-comparisons/> for more). I'm certain that the speed in tcrossprod is coming from an optimized compile of R and BLAS. My apologies for any confusion.
Avi On Sun, Apr 27, 2014 at 12:44 PM, Søren Højsgaard <[email protected]> wrote: > I also tried the tcrossprod but with results very different from what you > report. I wonder why??? > > I also used RcppArmadillo (mvprod3 below) which performs similar to mvprod2. > > > > Cheers > > Søren > > > >> all.equal(mvprod1(A,x), mvprod2(A,x), mvprod3(A,x), > > + A%*%x, tcrossprod(x, A)) > > [1] TRUE > > > >> library(microbenchmark) > > > >> microbenchmark(mvprod1(A,x), mvprod2(A,x), mvprod3(A,x), A%*%x, > > + tcrossprod(x,A)) > > Unit: milliseconds > > expr min lq median uq max neval > > mvprod1(A, x) 35.854102 36.151495 36.362352 36.658112 38.680147 100 > > mvprod2(A, x) 4.409340 4.429632 4.451791 4.522232 4.975201 100 > > mvprod3(A, x) 2.965528 2.977657 2.988386 3.043900 3.270617 100 > > A %*% x 8.506591 8.566303 8.637910 8.752902 9.233395 100 > > tcrossprod(x, A) 36.787099 37.232138 37.695370 41.378139 43.208443 100 > > > > > > // [[Rcpp::export]] > > arma::vec mvprod3(arma::mat& A, arma::vec& x ){ > > return A*x; > > } > > > >> sessionInfo() > > R version 3.1.0 Patched (2014-04-15 r65398) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 > LC_MONETARY=Danish_Denmark.1252 > > [4] LC_NUMERIC=C LC_TIME=Danish_Denmark.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] microbenchmark_1.3-0 RcppArmadillo_0.4.200.0 Rcpp_0.11.1 > devtools_1.5 > > [5] shTools_1.0 markdown_0.6.5 knitr_1.5 > > > > loaded via a namespace (and not attached): > > [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.3 formatR_0.10 httr_0.3 > memoise_0.1 parallel_3.1.0 > > [8] RCurl_1.95-4.1 stringr_0.6.2 tools_3.1.0 whisker_0.3-2 > > > > > > > > > > > > From: Avraham Adler [mailto:[email protected]] > Sent: 25. april 2014 19:43 > To: Søren Højsgaard > Cc: [email protected] > Subject: Re: [Rcpp-devel] Significant difference in performance when > computing Ax > > > > There is actually one other command to test, `crossprod`, or in this case > `tcrossprod(x, A)`. > > Following your code, first run and ensure TRUE is returned. > > d <- 2000 > A <- matrix(1.0*(1:d^2),nrow=d) > x <- 1.0*(1:d) > all.equal(A%*%x, tcrossprod(x, A), mvprod1(A,x), mvprod2(A,x)) > > > > Now, using an i7-3740QM 2.7Ghz CPU with 8GB of RAM as a testbed on Windows 7 > (with R & Rcpp compiled from source throwing "-march=core-avx-i -O3 > -std=gnu++0x --param l1-cache-line-size=64 --param l1-cache-size=32 --param > l2-cache-size=256" FWIW) I get the following timings: > > microbenchmark(A%*%x, tcrossprod(x, A), mvprod1(A, x), mvprod2(A, x), times > = 1000L, control = list(order = 'block')) > > > > > > > Unit: milliseconds > > expr min lq median uq max neval > > A %*% x 13.183992 13.242965 13.269027 13.317917 16.680117 1000 > > tcrossprod(x, A) 6.967139 6.991869 7.004424 7.020214 9.795921 1000 > > mvprod1(A, x) 43.089952 43.134848 43.163953 43.489824 45.129644 1000 > > mvprod2(A, x) 4.564480 4.883693 4.911848 4.928208 7.435495 1000 > > > > So the crossprod function is decently optimized as it is, but the prod2 is > still faster. > > Thanks for idea and post! > > Avi > > > > > > On Fri, Apr 25, 2014 at 12:47 PM, Søren Højsgaard <[email protected]> > wrote: > > Dear all, > > When forming the matrix-vector-product one can form the inner products > between As rows and x (mvprod1 below) or form a linear combination of As > columns (mvprod2 below). The difference in computing time is striking (at > least for me) for large matrices and it really illustrates the “caching > issue” (I assume): > > > >> microbenchmark(mvprod1(A,x), mvprod2(A,x), A%*%x) > > Unit: milliseconds > > expr min lq median > uq max neval > > mvprod1(A, x) 35.783812 36.310956 36.600186 36.991113 38.922889 100 > > mvprod2(A, x) 4.408892 4.436882 4.475368 4.568668 5.477176 100 > > A %*% x 8.514091 8.592230 8.734978 8.811951 9.738653 > 100 > > > > Just wanted to share this! > > Cheers > > Søren > > > > > > > > #include <Rcpp.h> > > using namespace Rcpp; > > > > //[[Rcpp::export]] > > NumericVector mvprod1 (NumericMatrix A, NumericVector x){ > > int nrA=A.nrow(), ncA=A.ncol(), i, j; > > NumericVector y(nrA); > > for (i=0; i<nrA; ++i){ > > for (j=0; j<ncA; ++j){ > > y[i] += A(i,j)*x[j]; > > } > > } > > return y; > > } > > > > //[[Rcpp::export]] > > NumericVector mvprod2 (NumericMatrix A, NumericVector x){ > > int nrA=A.nrow(), ncA=A.ncol(), i, j; > > NumericVector y(nrA); > > for (j=0; j<ncA; ++j){ > > for (i=0; i<nrA; ++i){ > > y[i] += A(i,j)*x[j]; > > } > > } > > return y; > > } > > > > > > > > /*** R > > d <- 5 > > A <- matrix(1.0*(1:d^2),nrow=d) > > x <- 1.0*(1:d) > > > > A%*%x > > mvprod1(A,x) > > mvprod2(A,x) > > > > d <- 2000 > > A <- matrix(1.0*(1:d^2),nrow=d) > > x <- 1.0*(1:d) > > > > library(microbenchmark) > > microbenchmark(mvprod1(A,x), mvprod2(A,x), A%*%x) > > */ > > > _______________________________________________ > Rcpp-devel mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel > > _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
