Dear Martin, Thank you for the propmt and highly informative reply!
In my experiments, I compared (a) smooth.spline (with both GCV and CV for automatic bw selection), (b) spm() in "SemiPar", which implements penalized splines smoothing with REML estimation of penalty, (c) glkerns() in "lokern", which implements the Gasser-Mueller kernel methodology, and (d) locpoly() in "KernSmooth" with direct-plugin method of Ruppert and Wand to estimate bandwidth. All of these smothers also have the derivative estimation. The key point that you observed as the reason for why glkerns() cannot be made more efficient, without making it inferior, is that the bandwidth that is "optimal" for function estimation is not necessarily optimal for derivative estimation. This is explicitly considered in derivative estimation in glkerns(). Other smoothers, however, do not recognize this, i.e. they use the same bandwith (and the same function estimate, of course) that is optimal only for function estimation to estimate derivatives. In fact, this is one of the main points of my work. Not surprisingly, glkerns() does a nice job of estimating f, f', and f'', whereas the other methods, with the exception of spm(), get progressively worse with increasing order of derivatives (I use RMISE - root-mean-integrated squared error to judge quality of smoothers). A slight plug - I will be presenting these results at UseR! 2009 next week. I knew that glkerns(x, y, deriv=1) is a completely independent estimation from glkerns(x, y, deriv=0). What I was really wondering was whether it might not be possible to have a call such as: glkerns(x, y, deriv=c(0, 1, 2), x.out=x.out). This would minimize some overhead associated with setting up repeated calls each time in R and Fortran. While the savings may not be huge for a single function smoothing, it might be substantial for large numbers of time-series. Best regards, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml ---------------------------------------------------------------------------- -------- -----Original Message----- From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] Sent: Thursday, July 02, 2009 12:35 PM To: Ravi Varadhan Cc: r-help@r-project.org; Martin Maechler Subject: Re: [R] lokern package >>>>> "RV" == Ravi Varadhan <rvarad...@jhmi.edu> >>>>> on Thu, 2 Jul 2009 10:51:03 -0400 writes: RV> Dear Martin, I have been playing a lot with the RV> glkerns() function in the "lokern" package for RV> "automatic" smoothing of time-series data. This kernel RV> smoothing approach of Gasser and Mueller seems to RV> perform quite well for estimating the function and its RV> derivatives (first and second derivatives). In fact, RV> this is one of the best methods based on my simulation RV> studies for comparing a number of "automatic" bandwidth RV> selection methods. That's quite interesting! What methods did you use in your comparison? [I assume you did use smooth.spline() as well] RV> I am interested in applying this to RV> automatic smoothing and feature extraction for a "large" RV> number (thousands) of time series, with hundreds of RV> points per time series. This is where I am interested RV> in seeing if the efficiency of "glkerns" can be RV> improved. RV> Here follows my specific question: RV> You have to call glkerns() separately each time to RV> compute the function and its derivatives, ie. if I want RV> the function and its first 2 derivatives, I have to make RV> 3 calls to glkerns(). Yes. Note however that each of these calls chooses a different kernel order ('korder') by default { korder <- deriv + 2 }, and uses *different* automatic bandwidths depending on both deriv and korder. Consequently, the result of glkerns(x,y, deriv=1) is *not* the derivative of the estimate glkerns(x,y, deriv=0) but rather an independent estimate of the derivative of the unknown g(). The same applies for deriv=2 etc. But the problem is even "worse": Let's assume you get the "optimal" global bandwidth estimate 'bandwidth' = h from the deriv=0 case. Then you could set this bandwidth explicitly for the deriv=1 and deriv=2 calls {and you'd gain quite a bit of execution time !}. *HOWEVER*, as the internal codes absolutely require k_{ord} - nu == korder - nu to be an *even* number, you can *not* keep 'korder' fixed together with 'bandwidth'. But if you change 'korder', the kernels change and the meaning of the bandwidth has changed too. I have just uploaded version 1.0-8 of package 'lokern' to CRAN, and this now contains a demo("gkl-derivs") which defines a utility function to play a bit with this (keeping bandwidth fixed). RV> This seems to me to be inefficient, especially for large RV> time series. In smooth.spline(), for example, you can RV> call it once to get the fit, and then use the fitted RV> object to compute the derivatives using predict(). Note that smooth.spline() is very different: If you use deriv=1 or deriv=2 in the predict() call, you get exactly the 1st or 2nd derivative of the same smoothing spline function. RV> Is it possible to have a similar feature in glkerns? As I have explained, this is not easily possible currently more for conceptual reasons than others. In principle it should be possible however, but that would both need some "theoretical" work {maybe just collecting the papers and formulae used} and then some Fortran code digging and shuffling. Would be a nice "semester work" for a student... Martin -- Martin <maech...@stat.math.ethz.ch> http://stat.ethz.ch/people/maechler Seminar für Statistik, ETH Zürich HG G 16 Rämistrasse 101 CH-8092 Zurich, SWITZERLAND phone: +41-44-632-3408 fax: ...-1228 <>< RV> Thanks for any suggestions. RV> Ravi. RV> ---------------------------------------------------------------------------- RV> ------- RV> Ravi Varadhan, Ph.D. RV> Assistant Professor, The Center on Aging and Health RV> Division of Geriatric Medicine and Gerontology RV> Johns Hopkins University RV> Ph: (410) 502-2619 RV> Fax: (410) 614-9625 RV> Email: rvarad...@jhmi.edu RV> Webpage: RV> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h RV> tml RV> ---------------------------------------------------------------------------- RV> -------- RV> [[alternative HTML version deleted]] RV> ______________________________________________ RV> R-help@r-project.org mailing list RV> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do RV> read the posting guide RV> http://www.R-project.org/posting-guide.html and provide RV> commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.