Hi,

Lately I run into a problem that my code R code is spending hours performing
simple moving window statistical operations.  As a result I did searched
archives for alternative (faster) ways of performing: mean, max, median and
mad operation over moving window (size 81)  on a vector with about 30K
points. And performed some timing for several ways that were suggested, and
few ways I come up with. The purpose of this email is to share some of my
findings and ask for more suggestions (especially about moving mad
function).

Sum over moving window can be done using many different ways. Here are some
sorted from the fastest to the slowest:
        1.      runmean = function(x, k) { 
                  n    = length(x) 
                  y    = x[ k:n ] - x[ c(1,1:(n-k)) ] # this is a difference
from the previous cell
                  y[1] = sum(x[1:k]);                 # find the first sum
                  y    = cumsum(y)                    # apply precomputed
differences
                  return(y/k)                         # return mean not sum
                }
        2.      filter(x, rep(1/k,k), sides=2, circular=T) - (stats package)
        3.      kernapply(x, kernel("daniell", m), circular=T) 
        4.      apply(embed(x,k), 1, mean)      
        5.      mywinfun <- function(x, k, FUN=mean, ...) 
{ # suggested in news group
                   n <- length(x) 
                   A <- rep(x, length=k*(n+1)) 
                   dim(A) <- c(n+1, k) 
                   sapply(split(A, row(A)), FUN, ...)[1:(n-k+1)] 
                }
        6.      rollFun(x, k, FUN=mean) - (fSeries package)
        7.      rollMean(x, k)  - (fSeries package)
        8.      SimpleMeanLoop = function(x, k) {
                   n = length(x)         # simple-minded loop used as a
baseline
                   y = rep(0, n) 
                   k = k%/%2;
                   for (i in (1+k):(n-k)) y[i] = mean(x[(i-k):(i+k)]) 
           }
        9.      running(x, fun=mean, width=k) - (gtools package)

Some of above functions return results that are the same length as x and
some return arrays with length n-k+1. The relative speeds (on Windows
machine) were as follow: 0.01, 0.09, 1.2, 8.1, 11.2, 13.4, 27.3, 63, 345. As
one can see there are about 5 orders of magnitude between the fastest and
the slowest. 


Maximum over moving window can be done as follow, in order of speed
        1.      runmax = function(x, k) { 
                  n = length(x) 
                  y = rep(0, n) 
                  m = k%/%2;
                  a = 0;
                  for (i in (1+m):(n-m)) {
                    if (a==y[i-1]) y[i] = max(x[(i-m):(i+m)])  # calculate
max of the window
                    else           y[i] = max(y[i-1], x[i+m]); # max of the
window is =y[i-1] 
                    a = x[i-m]  # point that will be removed from the window
                  }
                  return(y)
                } 
        2.      apply(embed(x,k), 1, max)      
        3.      SimpleMaxLoop(x, k) - similar to SimpleMeanLoop above
        4.      mywinfun(x, k, FUN=max) - see above
        5.      rollFun(x, k, FUN=max) - fSeries package
        6.      rollMax(x, k)  - fSeries package
        7.      running(x, fun=max, width=k) - gtools package
The relative speeds were: <0.01, 3, 3.4, 5.3, 7.5, 7.7, 15.3

Median over moving window can be done as follows:
        1.      runmed(x, k) - from stats package
        2.      SimpleMedLoop(x, k) - similar to SimpleMeanLoop above
        3.      apply(embed(x,k), 1, median)      
        4.      mywinfun(x, k, FUN=median) - see above
        5.      rollFun (x, k, FUN=median) - fSeries package
        6.      running(x, fun=max, width=k) - gtools package
Speeds: <0.01, 3.4, 9, 15, 29, 165

Mad over moving window can be done as follows:
        1.      runmad = function(x, k) 
                { 
                   n = length(x) 
                   A = embed(x,k)
                   A = abs(A - rep(apply(A, 1, median), k))
                   dim(A) = c(n-k+1, k) 
                   apply(A, 1, median)
} 
        2.      apply(embed(x,k), 1, mad)      
        3.      mywinfun(x, k, FUN=mad) - see above
        4.      SimpleMadLoop(x, k) - similar to SimpleMeanLoop above
        5.      rollFun(x, k, FUN=mad) - fSeries package
        6.      running(x, fun=mad, width=k) - gtools package
Speeds: 11, 18, 25, 50, 50, 400

Some thoughts about those results:
        *       All functions from Stats package (runmed, filter, kernapply)
are fast and hard to improve on.
        *       In case of Mean and Max a simple un-optimized R codes are
much faster than specialized functions build for the same purpose.
        *       apply(embed(x,k), 1, fun) - seem to be always faster than
any functions from fSeries package or "mywinfun" 
        *       "running" function from gtools package is horribly slow
compared to anything else
        *       "mywinfun" proposed as a faster version of
"apply(embed(x,k), 1, fun)" seems to be always slower

        Finally a question: I still need to get moving windows mad function
faster my "runmad" function is not that much faster than apply/embed combo,
and that I used before, and this is where my code spends most of its time. I
need something like "runmed" but for a mad function. Any suggestions?

Jarek
=====================================\====                 
 Jarek Tuszynski, PhD.                               o / \ 
 Science Applications International Corporation  <\__,|  
 (703) 676-4192                        ">  \
 [EMAIL PROTECTED]                   `    \



        [[alternative HTML version deleted]]

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to