On Thu, Feb 2, 2012 at 1:58 AM, <josef.p...@gmail.com> wrote: > On Wed, Feb 1, 2012 at 6:48 PM, Benjamin Root <ben.r...@ou.edu> wrote: > > > > > > On Wednesday, February 1, 2012, Pierre Haessig <pierre.haes...@crans.org > > > > wrote: > >> Hi, > >> > >> [I'm not sure whether this discussion belongs to numpy-discussion or > >> scipy-dev] > >> > >> In day to day time series analysis I regularly need to look at the data > >> autocorrelation ("acorr" or "acf" depending on the software package). > >> The straighforward available function I have is matplotlib.pyplot.acorr. > >> However, for a moderately long time series (say of length 10**5) it > taking a > >> huge time just to just dislays the autocorrelation values within a small > >> range of time lags. > >> The main reason being it is relying on np.correlate(x,x, mode=2) while > >> only a few lags are needed. > >> (I guess mode=2 is an (old fashioned?) way to set mode='full') > >> > >> I know that np.correlate performance issue has been discussed already, > and > >> there is a *somehow* related ticket > >> (http://projects.scipy.org/numpy/ticket/1260). I noticed in the > ticket's > >> change number 2 the following comment by Josef : "Maybe a truncated > >> convolution/correlation would be good". I'll come back to this soon. > >> > >> I made an example script "acf_timing.py" to start my point with some > >> timing data : > >> > >> In Ipython: > >>>>> run acf_timing.py # it imports statsmodel's acf + define 2 other acf > >>>>> implementations + an example data 10**5 samples long > >> > >> %time l,c = mpl_acf(a, 10) > >> CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s > >> Wall time: 11.18 s # pretty long... > >> > >> %time c = sm_acf(a, 10) > >> CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s > >> Wall time: 10.79 s # long as well. statsmodel has a similar underlying > >> implementation > >> # > >> > http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf > >> > >> #Now, better option : use the fft convolution > >> %time c=sm_acf(a, 10,fft=True) > >> CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s > >> Wall time: 0.07 s > >> # Fast, but I'm not sure about the memory implication of using fft > though. > >> > >> #The naive option : just compute the acf lags that are needed > >> %time l,c = naive_acf(a, 10) > >> CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s > >> Wall time: 0.01 s > >> # Iterative computation. Pretty silly but very fast > >> # (Now of course, this naive implementation won't scale nicely for a lot > >> of lags) > > I don't think it's silly to have a short python loop, statsmodels > actually uses the loop in the models, for example in yule_walker (and > GLSAR), because in most statistical application I wouldn't expect a > large number of lags. The time series models don't use the acov > directly, but I think most of the time we just loop over the lags. > > >> > >> Now comes (at last) the question : what should be done about this > >> performance issue ? > >> - should there be a truncated np.convolve/np.correlate function, as > Josef > >> suggested ? > >> - or should people in need of autocorrelation find some workarounds > >> because this usecase is not big enough to call for a change in > np.convolve ? > >> > >> I really feel this question is about *where* a change should be > >> implemented (numpy, scipy.signal, maplotlib ?) so that it makes sense > while > >> not breaking 10^10 lines of numpy related code... > >> > >> Best, > >> Pierre > >> > >> > > > > Speaking for matplotlib, the acorr() (and xcorr()) functions in mpl are > > merely a convenience. The proper place for any change would not be mpl > > (although, we would certainly take advantage of any improved acorr() and > > xcorr() that are made available in numpy. > > I also think that numpy or scipy would be the natural candidates for a > correlate that works fast for an intermediate number of desired lags > (but still short compared to length of data). > > scipy.signal is the right place I think. numpy shouldn't grow too many functions like this.
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion