<comment inline> Thomas Lumley wrote: > On Fri, 23 Jun 2006, Rogério Rosa da Silva wrote: > >> Dear All, >> >> My doubt about how to integrate a simple kernel density estimation >> goes on. >> >> I have seen the recent posts on integrate density estimation, which seem >> similar to my question. However, I haven't found a solution. >> >> I have made two simple kernel density estimation by: >> >> kde.1 <-density(x, bw=sd(x), kernel="gaussian")$y # x<- >> c(2,3,5,12) >> kde.2 <-density(y, bw=sd(y), kernel="gaussian")$y # y<- >> c(4,2,4,11) >> >> Now I would like to integrate the difference in the estimated density >> values, i.e.: >> >> diff.kde <- abs (kde.1- kde.2) >> >> How can I integrate diff.kde over -Inf to Inf ? > > Well, the answer is zero. > > Computationally this is a bit tricky. You can turn the density > estimates into functions with approxfun() > x<-rexp(100) > kde<-density(x) > f<-approxfun(kde$x,kde$y,rule=2) > integrate(f,-1,10) > 1.000936 with absolute error < 3.3e-05 > > But if you want to integrate over -Inf to Inf you need the function to > specify the values outside the range of the data. The only value that > will work over the range -Inf to Inf is zero
This may be true for standard splines but not if the segment at the end is log-linear. For example, consider data with range(x) = c(x1, x.n) with (n-2) knots at x[i], i = 2:(n-1) with x[i]<x[i+1]. Consider a function f(x) modeled with exp(a1+b1*x) over [x[1], x[2]]. Then if b1>0, integral{exp(a1+b1*x) from -Inf to x[2]} = exp(a1+b1*x[2])/b1. Do you know if this kind of thing has been studied? I am NOT familiar with the literature on splines, but it would seem to me that this kind of thing could be quite valuable for building fast approximations to various mixture distributions that are otherwise quite difficult and expensive to compute. It seems to me that this kind of thing could be used in multilevel modeling, with multidimensional smoothing splines fit to data obtained from potentially expensive evaluations of the unconditional likelihood, and the marginal likelihood could then be computed and optimized from the spline. Then an estimate of the uncertainty could then be converted into another set of points at which to evaluate the likelihood and the process repeated until convergence. I have problems like this that I can't solve right now, and it seems to me that this might lead to fast, accurate solutions. Comments? Spencer Graves >> f<-approxfun(kde$x,kde$y,yleft=0,yright=0) >> integrate(f,-1,10) > 1.00072 with absolute error < 1.5e-05 >> integrate(f,-Inf,Inf) > 1.000811 with absolute error < 2.3e-05 > > > -thomas > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html