[R] Problem with R density function
Hello, My friend has the following issue with R. I will be glad to receive any response. Thanks, Dhiman Bhadra Hello everyone, I am trying to use the 'density' function available with the base package of R to estimate the density of a data set for subsequent use. I just noticed that with even 1000 data points, the numerical integral of the estimated density using the Epanechnikov kernel is far from 1. I wonder if I am doing something wrong, or whether there is a bug: x=rnorm(1) dd=density(x,kernel=epanechnikov,n=101,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 5.7245 dd=density(x,kernel=epanechnikov,n=1001,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 2.870922 dd=density(x,kernel=epanechnikov,n=10001,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 0.9989762 So unless I use around 1 or more data points, the integral is wrong: there seems to be a scaling factor creeping in. Am I missing something? Best regards, *Apratim Guha* __ *Dr. Apratim Guha* *Associate Professor, Production Quantitative Methods Area, IIM Ahmedabad, * *Vastrapur, Ahmedabad 380015, INDIA. Phone: (91) 79 6632 4803* *Secretary: Ms. Sujatha Jayprakash: (91) 79 6632 4911* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with R density function
Hi, Have you tried using a different bandwidth rather than the number of points, the default bandwidth gives ... x - rnorm(1) dd - density(x,kernel=epanechnikov,n=101) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 1.001014 Martyn -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of DHIMAN BHADRA Sent: 14 May 2014 10:36 To: r-help@r-project.org Subject: [R] Problem with R density function Hello, My friend has the following issue with R. I will be glad to receive any response. Thanks, Dhiman Bhadra Hello everyone, I am trying to use the 'density' function available with the base package of R to estimate the density of a data set for subsequent use. I just noticed that with even 1000 data points, the numerical integral of the estimated density using the Epanechnikov kernel is far from 1. I wonder if I am doing something wrong, or whether there is a bug: x=rnorm(1) dd=density(x,kernel=epanechnikov,n=101,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 5.7245 dd=density(x,kernel=epanechnikov,n=1001,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 2.870922 dd=density(x,kernel=epanechnikov,n=10001,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 0.9989762 So unless I use around 1 or more data points, the integral is wrong: there seems to be a scaling factor creeping in. Am I missing something? Best regards, *Apratim Guha* __ *Dr. Apratim Guha* *Associate Professor, Production Quantitative Methods Area, IIM Ahmedabad, * *Vastrapur, Ahmedabad 380015, INDIA. Phone: (91) 79 6632 4803* *Secretary: Ms. Sujatha Jayprakash: (91) 79 6632 4911* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail has been scanned for all viruses by Star.\ _...{{dropped:3}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with R density function
Try adding plots, e.g. set.seed(20140514) x - rnorm(100) hist(x, prob=TRUE, ylim=c(0,10)) dd - density(x, n=10001, bw=0.001) lines(dd, col=2, type=s) dd - density(x, n=101, bw=0.001) lines(dd, col=3, type=s) The density function you produce with bw=0.001 is very irregular (many sharp, narrow peaks). You should expect to need many intervals (i.e., large n) in your Reimann integral to get an accurate estimate of the area under it. Chris -Original Message- From: Martyn Byng [mailto:martyn.b...@nag.co.uk] Sent: Wednesday, May 14, 2014 5:58 AM To: DHIMAN BHADRA; r-help@r-project.org Subject: Re: [R] Problem with R density function Hi, Have you tried using a different bandwidth rather than the number of points, the default bandwidth gives ... x - rnorm(1) dd - density(x,kernel=epanechnikov,n=101) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 1.001014 Martyn -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of DHIMAN BHADRA Sent: 14 May 2014 10:36 To: r-help@r-project.org Subject: [R] Problem with R density function Hello, My friend has the following issue with R. I will be glad to receive any response. Thanks, Dhiman Bhadra Hello everyone, I am trying to use the 'density' function available with the base package of R to estimate the density of a data set for subsequent use. I just noticed that with even 1000 data points, the numerical integral of the estimated density using the Epanechnikov kernel is far from 1. I wonder if I am doing something wrong, or whether there is a bug: x=rnorm(1) dd=density(x,kernel=epanechnikov,n=101,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 5.7245 dd=density(x,kernel=epanechnikov,n=1001,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 2.870922 dd=density(x,kernel=epanechnikov,n=10001,bw=0.001) sum(dd$y)*(dd$x[2]-dd$x[1]) [1] 0.9989762 So unless I use around 1 or more data points, the integral is wrong: there seems to be a scaling factor creeping in. Am I missing something? Best regards, *Apratim Guha* __ *Dr. Apratim Guha* *Associate Professor, Production Quantitative Methods Area, IIM Ahmedabad, * *Vastrapur, Ahmedabad 380015, INDIA. Phone: (91) 79 6632 4803* *Secretary: Ms. Sujatha Jayprakash: (91) 79 6632 4911* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail has been scanned for all viruses by Star.\ _...{{dropped:7}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.