Hello,

I've added some of my spices in https://gist.github.com/romainfrancois/6974012

rowApply3 creates a NumericVector of the appropriate dimensions and assign data to it in the loop. Since the dimensions match, the assignment operator will not change the underlying SEXP so the call made by `Language call(FUN, row);` does not need to be updated by the mean of the proxy as in rowApply2. In theory, this should lead to less allocations so better performance. In rowApply2 we allocated a vector at each iteration of the loop. Now we don't see much of a performance boost. This might be related to the implementation of MatrixRow.

Now, the second potential source of performance improvement is with using mean_ instead of mean :

mean_ <- function(.) .Internal(mean(.))

mean gives you a lot of nice things like dispatching and additional arguments to handle trimming, etc ... but you have to pay for them even when you don't use them.


rowApply4 uses less syntactic sugar, so it is less nice to look at. The idea was to see what was the cost of MatrixRow.

Here are the results of the benchmarks on my machine on the 30x50 example:

Unit: microseconds
                expr   min    lq median    uq    max neval
         rowMeans(M)  10.7  14.4   15.1  15.7   21.1   100
  apply(M, 1L, mean) 350.6 362.0  371.2 382.7 2225.8   100
  rowApply0(M, mean) 635.8 655.2  672.3 716.8 2595.1   100
  rowApply1(M, mean) 225.1 233.8  239.7 249.8 2212.1   100
  rowApply2(M, mean) 221.3 229.4  236.6 260.2  329.0   100
  rowApply3(M, mean) 217.0 223.9  229.1 247.7 2077.2   100
 rowApply3(M, mean_)  34.3  38.2   39.3  41.1   47.0   100
 rowApply4(M, mean_)  34.1  36.7   38.0  39.3   47.1   100

rowMeans is better because:
- it does not involve calling the interpreter in the loop. even with fast_eval, this has cost.
- all is done internally with a single loop looking like this pseudo code:

NumericVector means( ncol ) ;
for( int j=0, k=0; j=ncol; j++ )
   for(int i=0; i<nrow; i++)
       means[i] += x[k] ;
means /= ncol ;

However rowMeans can only do means. Our rowApply can apply any R function.

One strategy we are using dplyrRcpp (soon to be merged into dplyr) is recognize known patterns and then decide whether we want to use the R interpreter or some optimized code. Here we would recognize "mean" and not use the R interpreter at all.

This is a nice game as it involves non standard evaluation, etc ... so that sort of goes beyond the scope of this thread, but using the internal dplyr tools, we can definitely make a fast flexible rowApply that would call optimized code when recognizing known patterns and fall back to the R interpreter otherwise.

Romain

Le 11/10/13 16:27, Hadley Wickham a écrit :
Interestingly, that shows that rowApply2 is actually slightly _slower_
than rowApply1.

Hadley

On Fri, Oct 11, 2013 at 9:24 AM, Hadley Wickham <h.wick...@gmail.com> wrote:
FYI, I recommend using microbenchmark which uses a much higher
precision timer so you can see the variability as well as the mean
times. Also note that you can use /*** R */ to include R code that's
automatically run when benchmarking:
https://gist.github.com/hadley/6935459

That gives:

microbenchmark(rowMeans(M),
+   apply(M, 1L, mean),
+   rowApply0(M, mean),
+   rowApply1(M, mean),
+   rowApply2(M, mean))
Unit: microseconds
                expr   min    lq median    uq   max neval
         rowMeans(M)  5.31  7.11   7.54  7.84  13.9   100
  apply(M, 1L, mean) 46.23 49.46  50.78 53.06 136.9   100
  rowApply0(M, mean) 50.87 53.81  55.36 57.54  85.3   100
  rowApply1(M, mean) 18.92 20.14  21.20 22.44  36.9   100
  rowApply2(M, mean) 18.73 20.18  21.27 22.40  34.0   100

M <- matrix(rnorm(1500L), nrow=30L);

microbenchmark(rowMeans(M),
+   apply(M, 1L, mean),
+   rowApply0(M, mean),
+   rowApply1(M, mean),
+   rowApply2(M, mean))
Unit: microseconds
                expr    min  lq median    uq    max neval
         rowMeans(M)   9.86  12     13  14.1   21.9   100
  apply(M, 1L, mean) 265.28 288    308 349.8  475.2   100
  rowApply0(M, mean) 506.54 541    575 604.9 5469.3   100
  rowApply1(M, mean) 174.37 185    206 232.4  334.0   100
  rowApply2(M, mean) 171.55 183    223 245.8 4244.7   100

On Fri, Oct 4, 2013 at 11:36 AM, Thomas Tse <tommy_228_...@yahoo.com.hk> wrote:
The next test compares the speeds:

// [[Rcpp::export]]
NumericVector rowApply0(NumericMatrix& x, const Function& FUN)
{
   int n = x.nrow();
   NumericVector result = no_init(n);

   for (int r = 0; r < n; r++) {
     result[r] = as<double>(FUN(x(r, _) ) );
   }
   return result;
}

// [[Rcpp::export]]
NumericVector rowApply1(NumericMatrix& x, const Function& FUN)
{
   int n = x.nrow();
   NumericVector result = no_init(n);

   for (int r = 0; r < n; r++) {
     Language call(FUN, x(r, _)) ;
     result[r] = as<double>(call.fast_eval() );
   }
   return result;
}

// [[Rcpp::export]]
NumericVector rowApply2(NumericMatrix& x, const Function& FUN)
{
   int n = x.nrow();
   NumericVector result = no_init(n);

   Language call(FUN, R_NilValue);
   Language::Proxy proxy(call, 1);
   for (int r = 0; r < n; r++) {
     proxy = x(r, _) ;
     result[r] = as<double>(call.fast_eval() );
   }
   return result;
}

M <- matrix(rnorm(15L), nrow=3L);
identical(rowMeans(M), apply(M, 1L, mean));
[1] TRUE
identical(rowMeans(M), rowApply0(M, mean));
[1] TRUE
identical(rowMeans(M), rowApply1(M, mean));
[1] TRUE
identical(rowMeans(M), rowApply2(M, mean));
[1] TRUE
benchmark(rowMeans(M),
+           apply(M, 1L, mean),
+           rowApply0(M, mean),
+           rowApply1(M, mean),
+           rowApply2(M, mean),
+           replications=300000L);
                 test replications elapsed relative user.self sys.self
user.child sys.child
3 rowApply0(M, mean)       300000   23.26    8.551     23.14        0
NA        NA
2  apply(M, 1, mean)       300000   21.30    7.831     21.13        0
NA        NA
4 rowApply1(M, mean)       300000   10.23    3.761     10.22        0
NA        NA
5 rowApply2(M, mean)       300000    9.87    3.629      9.82        0
NA        NA
1        rowMeans(M)       300000    2.72    1.000      2.72        0
NA        NA

M <- matrix(rnorm(1500L), nrow=30L);
benchmark(rowMeans(M),
+           apply(M, 1L, mean),
+           rowApply0(M, mean),
+           rowApply1(M, mean),
+           rowApply2(M, mean),
+           replications=30000L);
                 test replications elapsed relative user.self sys.self
user.child sys.child
3 rowApply0(M, mean)        30000   21.32   48.455     21.30        0
NA        NA
2  apply(M, 1, mean)        30000   11.17   25.386     11.17        0
NA        NA
4 rowApply1(M, mean)        30000    8.24   18.727      8.19        0
NA        NA
5 rowApply2(M, mean)        30000    8.10   18.409      8.08        0
NA        NA
1        rowMeans(M)        30000    0.44    1.000      0.43        0
NA        NA

M <- matrix(rnorm(150000L), nrow=300L);
benchmark(rowMeans(M),
+           apply(M, 1L, mean),
+           rowApply0(M, mean),
+           rowApply1(M, mean),
+           rowApply2(M, mean),
+           replications=3000L);
                 test replications elapsed relative user.self sys.self
user.child sys.child
3 rowApply0(M, mean)         3000   26.65   18.379     26.41     0.01
NA        NA
2  apply(M, 1, mean)         3000   19.76   13.628     19.71     0.00
NA        NA
4 rowApply1(M, mean)         3000   12.79    8.821     12.78     0.00
NA        NA
5 rowApply2(M, mean)         3000   12.69    8.752     12.68     0.00
NA        NA
1        rowMeans(M)         3000    1.45    1.000      1.45     0.00
NA        NA

again, as Romain suggested, implementation rowApply2 is faster than
rowApply1, and rowApply0 (which triggers tryCatch) is even slower than R's
apply.

Thanks Romain !


_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel



--
Chief Scientist, RStudio
http://had.co.nz/





--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to