> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Lawrence > Sent: Friday, February 11, 2011 9:28 AM > To: r-h...@lists.r-project.org > Subject: [R] Help optimizing EMD::extrema() > > Hi folks, > > I'm attempting to use the EMD package to analyze some neuroimaging > data (timeseries with 64 channels sampled across 1 million time points > within each of 20 people). I found that processing a single channel of > data using EMD::emd() took about 8 hours. Exploration using Rprof() > suggested that most of the compute time was spent in EMD::extrema(). > Looking at the code for EMD:extrema(), I managed to find one obvious > speedup (switching from employing rbind() to c()) and I suspect that > there may be a way to further speed things up by pre-allocating all > the objects that are currently being created with c(), but I'm having > trouble understanding the code sufficiently to know when/where to try > this and what sizes to set as the default pre-allocation length. Below > I include code that demonstrates the speedup I achieved by eliminating > calls to rbind(), and also demonstrates that only a few calls to c() > seem to be responsible for most of the compute time. The files > "extrema_c.R" and "extrema_c2.R" are available at: > https://gist.github.com/822691
Try the following new.extrema(). It is based on looking at runs in the data in a vectorized way. On my old laptop the running times for length(x)=2^(2:118) with EMD::extrema and new.extrema are old.time new.time 4 0.00 0.00 8 0.00 0.00 16 0.00 0.00 32 0.00 0.00 64 0.00 0.00 128 0.00 0.00 256 0.00 0.00 512 0.02 0.00 1024 0.03 0.00 2048 0.06 0.01 4096 0.14 0.00 8192 0.37 0.02 16384 1.08 0.03 32768 3.64 0.06 65536 13.35 0.12 131072 48.42 0.25 262144 206.17 0.59 isEndOfStrictlyIncreasingRun <- function(x) { c(FALSE, diff(diff(x) > 0) < 0, FALSE) } isStartOfStrictlyIncreasingRun <- function(x) { c(FALSE, diff(diff(x) <= 0) < 0, FALSE) } isEndOfStrictlyDecreasingRun <- function(x) { isEndOfStrictlyIncreasingRun(-x) } isStartOfStrictlyDecreasingRun <- function(x) { isStartOfStrictlyIncreasingRun(-x) } isStartOfZeroRun <- function(x) { x==0 & c(TRUE, x[-length(x)]!=0) } nToEndOfCurrentFlatRun <- function(x) { # for each element of x, how far to end of current # run of equal values. rev( sequence(rle(rev(x))$lengths) - 1L) } indexOfEndOfCurrentFlatRun <- function(x) { # should be a more direct way of doing this, but this is pretty quick seq_len(length(x)) + nToEndOfCurrentFlatRun(x) } new.extrema <- function(x) { f <- indexOfEndOfCurrentFlatRun(x) isMaxStart <- isEndOfStrictlyIncreasingRun(x) & isStartOfStrictlyDecreasingRun(x)[f] maxindex <- cbind(which(isMaxStart), f[isMaxStart], deparse.level=0) isMinStart <- isEndOfStrictlyDecreasingRun(x) & isStartOfStrictlyIncreasingRun(x)[f] minindex <- cbind(which(isMinStart), f[isMinStart], deparse.level=0) # zero-crossings are bit weird: Report either the before-zero/after-zero # pair or the start and stop of a a run of zero's (even if the run is # not part of an actual zero-crossing). Do them separately then sort. # Also, if the entire sequence never actually crosses 0, do # not report the zero-touchings. # Also, if length(x)==2, the original doesn't report a zero-crossing or # a zero run. We do not copy that. if (max(x) > 0 && min(x) < 0) { indexOfZeroCrossingStart <- which(c(abs(diff(sign(x)))==2, FALSE)) indexOfZeroCrossingEnd <- indexOfZeroCrossingStart + 1L indexOfZeroRunStart <- which(isStartOfZeroRun(x)) indexOfZeroRunEnd <- f[indexOfZeroRunStart] crossStart <- c(indexOfZeroCrossingStart, indexOfZeroRunStart) cross <- cbind(crossStart, c(indexOfZeroCrossingEnd, indexOfZeroRunEnd), deparse.level=0)[order(crossStart),, drop=FALSE] } else { cross <- cbind(integer(), integer()) } # extrema likes to return NULL instead of a zero-row matrix, # so we follow it. Zero-row matrices are easier to deal with. list( minindex=if (nrow(minindex)) minindex else NULL, maxindex=if (nrow(maxindex)) maxindex else NULL, nextreme=nrow(minindex) + nrow(maxindex), cross=if(nrow(cross)) cross else NULL, ncross=nrow(cross) # extrema() returns numeric, not integer, here ) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Any suggestions/help would be greatly appreciated. > > > #load the EMD library for the default version of extrema > library(EMD) > > #some data to process > values = rnorm(1e4) > > #profile the default version of extrema > Rprof(tmp <- tempfile()) > temp = extrema(values) > Rprof() > summaryRprof(tmp) > #1.2s total with most time spend doing rbind > unlink(tmp) > > #load a rbind-free version of extrema > source('extrema_c.R') > Rprof(tmp <- tempfile()) > temp = extrema_c(values) > Rprof() > summaryRprof(tmp) #much faster! .5s total > unlink(tmp) > > #still, it encounters slowdowns with lots of data > values = rnorm(1e5) > Rprof(tmp <- tempfile()) > temp = extrema_c(values) > Rprof() > summaryRprof(tmp) > #44s total, hard to see what's taking up so much time > unlink(tmp) > > #load an rbind-free version of extrema that labels each call to c() > source('extrema_c2.R') > Rprof(tmp <- tempfile()) > temp = extrema_c2(values) > Rprof() > summaryRprof(tmp) > #same time as above, but now we see that it spends more time in > certain calls to c() than others > unlink(tmp) > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.