Wow, great work! This makes a huge difference (i.e. 5 mins to process an electrode instead of 5 hours).
I believe that the ndata and ndatam1 are there to let EMD::emd() and EMD::extractimf() (the functions that call EMD::extrema()) implement various boundary corrections. I personally don't intend on examining the start or end in great detail, so I think the uncorrected version is fine for my application. Thanks again! Mike On Sun, Feb 13, 2011 at 6:57 PM, William Dunlap <wdun...@tibco.com> wrote: > I did not try to emulate the ndata nad ndatam1 arguments > to extrema(), as I didn't see what they were for. The > help file simply says they are the length of the first > argument and that minus 1 and that is what their default > values are. If they do not have their default values > then extrema() frequently dies. You could add them them > to the argument list and not use them, or check that > they are what they default to, as in > function(x, ndata, ndatam1) { > stopifnot(length(x)==ndata, ndata-1==ndatam1) > ... rest of code ... > If the check fails then someone needs to say or figure out > what they are for. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> -----Original Message----- >> From: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap >> Sent: Sunday, February 13, 2011 2:08 PM >> To: Mike Lawrence; r-h...@lists.r-project.org >> Subject: Re: [R] Help optimizing EMD::extrema() >> >> > -----Original Message----- >> > From: r-help-boun...@r-project.org >> > [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Lawrence >> > Sent: Friday, February 11, 2011 9:28 AM >> > To: r-h...@lists.r-project.org >> > Subject: [R] Help optimizing EMD::extrema() >> > >> > Hi folks, >> > >> > I'm attempting to use the EMD package to analyze some neuroimaging >> > data (timeseries with 64 channels sampled across 1 million >> time points >> > within each of 20 people). I found that processing a single >> channel of >> > data using EMD::emd() took about 8 hours. Exploration using Rprof() >> > suggested that most of the compute time was spent in EMD::extrema(). >> > Looking at the code for EMD:extrema(), I managed to find one obvious >> > speedup (switching from employing rbind() to c()) and I suspect that >> > there may be a way to further speed things up by pre-allocating all >> > the objects that are currently being created with c(), but >> I'm having >> > trouble understanding the code sufficiently to know >> when/where to try >> > this and what sizes to set as the default pre-allocation >> length. Below >> > I include code that demonstrates the speedup I achieved by >> eliminating >> > calls to rbind(), and also demonstrates that only a few calls to c() >> > seem to be responsible for most of the compute time. The files >> > "extrema_c.R" and "extrema_c2.R" are available at: >> > https://gist.github.com/822691 >> >> Try the following new.extrema(). It is based on >> looking at runs in the data in a vectorized way. >> On my old laptop the running times for length(x)=2^(2:118) >> with EMD::extrema and new.extrema are >> old.time new.time >> 4 0.00 0.00 >> 8 0.00 0.00 >> 16 0.00 0.00 >> 32 0.00 0.00 >> 64 0.00 0.00 >> 128 0.00 0.00 >> 256 0.00 0.00 >> 512 0.02 0.00 >> 1024 0.03 0.00 >> 2048 0.06 0.01 >> 4096 0.14 0.00 >> 8192 0.37 0.02 >> 16384 1.08 0.03 >> 32768 3.64 0.06 >> 65536 13.35 0.12 >> 131072 48.42 0.25 >> 262144 206.17 0.59 >> >> isEndOfStrictlyIncreasingRun <- function(x) { >> c(FALSE, diff(diff(x) > 0) < 0, FALSE) >> } >> >> isStartOfStrictlyIncreasingRun <- function(x) { >> c(FALSE, diff(diff(x) <= 0) < 0, FALSE) >> } >> >> isEndOfStrictlyDecreasingRun <- function(x) { >> isEndOfStrictlyIncreasingRun(-x) >> } >> >> isStartOfStrictlyDecreasingRun <- function(x) { >> isStartOfStrictlyIncreasingRun(-x) >> } >> >> isStartOfZeroRun <- function(x) { >> x==0 & c(TRUE, x[-length(x)]!=0) >> } >> >> nToEndOfCurrentFlatRun <- function(x) { >> # for each element of x, how far to end of current >> # run of equal values. >> rev( sequence(rle(rev(x))$lengths) - 1L) >> } >> >> indexOfEndOfCurrentFlatRun <- function(x) { >> # should be a more direct way of doing this, but this is pretty >> quick >> seq_len(length(x)) + nToEndOfCurrentFlatRun(x) >> } >> >> new.extrema <- function(x) { >> f <- indexOfEndOfCurrentFlatRun(x) >> isMaxStart <- isEndOfStrictlyIncreasingRun(x) & >> isStartOfStrictlyDecreasingRun(x)[f] >> maxindex <- cbind(which(isMaxStart), f[isMaxStart], >> deparse.level=0) >> >> isMinStart <- isEndOfStrictlyDecreasingRun(x) & >> isStartOfStrictlyIncreasingRun(x)[f] >> minindex <- cbind(which(isMinStart), f[isMinStart], >> deparse.level=0) >> >> >> # zero-crossings are bit weird: Report either the >> before-zero/after-zero >> # pair or the start and stop of a a run of zero's (even if the run >> is >> # not part of an actual zero-crossing). Do them separately then >> sort. >> # Also, if the entire sequence never actually crosses 0, do >> # not report the zero-touchings. >> # Also, if length(x)==2, the original doesn't report a >> zero-crossing >> or >> # a zero run. We do not copy that. >> if (max(x) > 0 && min(x) < 0) { >> indexOfZeroCrossingStart <- which(c(abs(diff(sign(x)))==2, >> FALSE)) >> indexOfZeroCrossingEnd <- indexOfZeroCrossingStart + 1L >> indexOfZeroRunStart <- which(isStartOfZeroRun(x)) >> indexOfZeroRunEnd <- f[indexOfZeroRunStart] >> crossStart <- c(indexOfZeroCrossingStart, indexOfZeroRunStart) >> cross <- cbind(crossStart, c(indexOfZeroCrossingEnd, >> indexOfZeroRunEnd), deparse.level=0)[order(crossStart),, drop=FALSE] >> } else { >> cross <- cbind(integer(), integer()) >> } >> # extrema likes to return NULL instead of a zero-row matrix, >> # so we follow it. Zero-row matrices are easier to deal with. >> list( >> minindex=if (nrow(minindex)) minindex else NULL, >> maxindex=if (nrow(maxindex)) maxindex else NULL, >> nextreme=nrow(minindex) + nrow(maxindex), >> cross=if(nrow(cross)) cross else NULL, >> ncross=nrow(cross) # extrema() returns numeric, not integer, >> here >> ) >> } >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> > >> > Any suggestions/help would be greatly appreciated. >> > >> > >> > #load the EMD library for the default version of extrema >> > library(EMD) >> > >> > #some data to process >> > values = rnorm(1e4) >> > >> > #profile the default version of extrema >> > Rprof(tmp <- tempfile()) >> > temp = extrema(values) >> > Rprof() >> > summaryRprof(tmp) >> > #1.2s total with most time spend doing rbind >> > unlink(tmp) >> > >> > #load a rbind-free version of extrema >> > source('extrema_c.R') >> > Rprof(tmp <- tempfile()) >> > temp = extrema_c(values) >> > Rprof() >> > summaryRprof(tmp) #much faster! .5s total >> > unlink(tmp) >> > >> > #still, it encounters slowdowns with lots of data >> > values = rnorm(1e5) >> > Rprof(tmp <- tempfile()) >> > temp = extrema_c(values) >> > Rprof() >> > summaryRprof(tmp) >> > #44s total, hard to see what's taking up so much time >> > unlink(tmp) >> > >> > #load an rbind-free version of extrema that labels each call to c() >> > source('extrema_c2.R') >> > Rprof(tmp <- tempfile()) >> > temp = extrema_c2(values) >> > Rprof() >> > summaryRprof(tmp) >> > #same time as above, but now we see that it spends more time in >> > certain calls to c() than others >> > unlink(tmp) >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.