[R] density function - kernel density estimation
Dear all, Concerning the function density from the stats package, I don´t know what is the dependency between the number of equally spaced points at which the density is to be estimated (the ´n´ argument) and the data from which the kernel density estimate is to be computed (the ´x´ argument). Basically, I would like to know what is the number ´n´ that I should use if I have a data with ´x´ values. Kind regards, João Fadista [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density
There's a nice package ('ks') which even allows you to specify a matrix of bandwidths (not only one bandwidth for each coordinate direction). Hope this helps, Emili Missatge citat per Bruce Willy [EMAIL PROTECTED]: Hello, I have a n*2 matrix, called plan, which contains n observations from 2 variates. I want a kernel density estimate of the joint distribution of these 2 variates. I try : density(plan). Unfortunately, R thinks there is 2n observations (if n=10, 20 observations), where there is only n. How to to make a multivariate kernel density estimate ? Thank you very much. _ météo et bien plus encore ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] density
Hello, I have a n*2 matrix, called plan, which contains n observations from 2 variates. I want a kernel density estimate of the joint distribution of these 2 variates. I try : density(plan). Unfortunately, R thinks there is 2n observations (if n=10, 20 observations), where there is only n. How to to make a multivariate kernel density estimate ? Thank you very much. _ météo et bien plus encore ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density
Try bkde2D {KernSmooth} or kde2d {MASS}. Bruce Willy wrote: Hello, I have a n*2 matrix, called plan, which contains n observations from 2 variates. I want a kernel density estimate of the joint distribution of these 2 variates. I try : density(plan). Unfortunately, R thinks there is 2n observations (if n=10, 20 observations), where there is only n. How to to make a multivariate kernel density estimate ? Thank you very much. _ météo et bien plus encore ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density estimation graphs
Mark Wardle wrote: Dear all, I'm struggling with a plot and would value any help! ... Is there a better way? As always, I'm sure there's a one-liner rather than my crude technique! As always, I've spent ages trying to sort this, and then the minute after sending an email, I find the polygon() function. Ignore previous message! Best wishes, Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Density estimation graphs
Dear all, I'm struggling with a plot and would value any help! I'm attempting to highlight a histogram and density plot to show a proportion of cases above a threshold value. I wanted to cross-hatch the area below the density curve. The breaks and bandwidth are deliberate integer values because of the type of data I'm looking at. I've managed to do this, but I don't think it is very good! It would be difficult, for example, to do a cross-hatch using this technique. allele.plot - function(x, threshold=NULL, hatch.col='black', hatch.border=hatch.col, lwd=par('lwd'),...) { h - hist(x, breaks=max(x), plot=F) d - density(x, bw=1) plot(d, lwd=lwd, ...) if (!is.null(threshold)) { d.t - d$xthreshold d.x - d$x[d.t] d.y - d$y[d.t] d.l - length(d.x) # draw all but first line of hatch for (i in 2:d.l) { lines(c(d.x[i],d.x[i]),c(0,d.y[i]), col=hatch.col,lwd=1) } # draw first line in hatch border colour lines(c(d.x[1],d.x[1]),c(0,d.y[1]), col=hatch.border,lwd=lwd) # and now re-draw density plot lines lines(d, lwd=lwd) } } # some pretend data s8 = rnorm(100, 15, 5) threshold = 19 # an arbitrary cut-off allele.plot(s8, threshold, hatch.col='grey',hatch.border='black') Is there a better way? As always, I'm sure there's a one-liner rather than my crude technique! Best wishes, Mark -- Dr. Mark Wardle Clinical research fellow and specialist registrar, Neurology University Hospital Wales and Cardiff University, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density estimation graphs
On Mar 15, 2007, at 12:37 PM, Mark Wardle wrote: Dear all, I'm struggling with a plot and would value any help! I'm attempting to highlight a histogram and density plot to show a proportion of cases above a threshold value. I wanted to cross- hatch the area below the density curve. The breaks and bandwidth are deliberate integer values because of the type of data I'm looking at. I've managed to do this, but I don't think it is very good! It would be difficult, for example, to do a cross-hatch using this technique. Don't know about a cross-hatch, but in general I use polygon for highlighting areas like that: allele.plot - function(x, threshold=NULL, hatch.col='black', hatch.border=hatch.col, lwd=par('lwd'),...) { h - hist(x, breaks=max(x), plot=F) d - density(x, bw=1) plot(d, lwd=lwd, ...) if (!is.null(threshold)) { d.t - d$xthreshold d.x - d$x[d.t] d.y - d$y[d.t] polygon(c(d.x[1],d.x,d.x[1]),c(0,d.y,0), col=hatch.col,lwd=1) } } # some pretend data s8 = rnorm(100, 15, 5) threshold = 19 # an arbitrary cut-off allele.plot(s8, threshold, hatch.col='grey',hatch.border='black') Perhaps this can help a bit. Btw, what was d.l for? Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] density plot text
Is there any way of adding text to a density plot? I have had a go using the text() function but I think the error is because this function doesn't work with densityplot(). Alternatively, I understand I can achieve pretty much the same result if I plot a density kernel estimate using plot() (which allows text()), but I do prefer densityplot(). Also, is it possible to specify the dimensions of a graphics device? I don't mean the x and y limits of a plot, but rather can I change the dimensions of the default (square) graphics device? Many thanks Murray try - (rnorm(100, mean = 5, sd = 3)) library(lattice) trellis.device(col = FALSE, theme = lattice.getOption(col.whitebg)) densityplot(~try) normtest - shapiro.test(try) normtest pvalue - round(normtest$p.value,5) normtext - paste(normtest$method,p-value =,pvalue) normtext xcoord - max(try)*0.6 text(xcoord,0.1,normtext) # alternative plot(density(try)) text(0,0.1,normtext) -- Murray Pung Statistician, Datapharm Australia Pty Ltd 0404 273 283 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density plot text
On 10/25/06, Murray Pung [EMAIL PROTECTED] wrote: Is there any way of adding text to a density plot? I have had a go using the text() function but I think the error is because this function doesn't work with densityplot(). Alternatively, I understand I can achieve pretty much the same result if I plot a density kernel estimate using plot() (which allows text()), but I do prefer densityplot(). Also, is it possible to specify the dimensions of a graphics device? I don't mean the x and y limits of a plot, but rather can I change the dimensions of the default (square) graphics device? Many thanks Murray try - (rnorm(100, mean = 5, sd = 3)) library(lattice) trellis.device(col = FALSE, theme = lattice.getOption(col.whitebg)) densityplot(~try) normtest - shapiro.test(try) normtest pvalue - round(normtest$p.value,5) normtext - paste(normtest$method,p-value =,pvalue) normtext xcoord - max(try)*0.6 text(xcoord,0.1,normtext) You have two options: either trellis.focus(panel, 1, 1) panel.text(xcoord,0.1,normtext) trellis.unfocus() which is analogous to the plot(density()) paradigm, or, what in this situation is more appropriate IMO (as it will work for multipanel plots as well): densityplot(~try, panel = function(x, ...) { panel.densityplot(x, ...) normtest - shapiro.test(x) pvalue - round(normtest$p.value,5) normtext - paste(normtest$method,p-value =, pvalue) xcoord - max(try) * 0.6 panel.text(xcoord, 0.1, normtext) }) There are ways to make a safer choice of the y coordinate than 0.1; see ?current.panel.limits and library(grid) ?grid.text -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] density plots????
Dear all, I arrive to do density plots using the function kde2d , and from this do a countour plot. My problem is that I do not really understand what the labels for the different levels mean??? What I would like to obtain is a surface encompassing the 95 percentile of my values. In other words I would like the levels to represent, for example, the 90th, 95th and 99th percentiles of my values. I hope I have been clear. Do you think you can help me??? I would be VERY grateful. Thanks in advance Luis Barreiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] density() with from, to or cut and comparrison of density()
Hi the function density() does normally integrate to one - I've checked it and it works and I also read the previous threads. But I realised that it does not integrate to one if I use from, to or cut. My scenario: I simulated densities of a plants originating from an sseed source at distance zero. Therefore the density of the plants will be highest close to zero. Is there anything I can do to have this pattern? If I use 'from' or 'cut', the resulting densities do not integrate to one which I need as I want to compare different density curves. Ny second question is concerning the bandwidth. An I correct in saying that if I want to compare different density estimates that the bandwidth should be the same for all of them? Thanks in advance for your help, Rainer -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel:+27 - (0)72 808 2975 (w) Fax:+27 - (0)21 808 3304 Cell: +27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density() with from, to or cut and comparrison of density()
Rainer M Krug wrote: Hi the function density() does normally integrate to one - I've checked it and it works and I also read the previous threads. But I realised that it does not integrate to one if I use from, to or cut. My scenario: I simulated densities of a plants originating from an sseed source at distance zero. Therefore the density of the plants will be highest close to zero. Is there anything I can do to have this pattern? If I use 'from' or 'cut', the resulting densities do not integrate to one which I need as I want to compare different density curves. The kernel chosen might be not the ideal one for such a restriction. If the density outside the cut range is extremely small, you might want to do a dirty transformation so that the values sum up to 1 again. Ny second question is concerning the bandwidth. An I correct in saying that if I want to compare different density estimates that the bandwidth should be the same for all of them? Yes. Uwe Ligges Thanks in advance for your help, Rainer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density() with from, to or cut and comparrison of density()
You may want to look at the logspline package, it uses a different technique than density does, but it estimates densities and allows you to tell the routine that there is a minimum value and that the density does not extend beyond there. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rainer M Krug Sent: Wednesday, August 30, 2006 4:27 AM To: R help list Subject: [R] density() with from, to or cut and comparrison of density() Hi the function density() does normally integrate to one - I've checked it and it works and I also read the previous threads. But I realised that it does not integrate to one if I use from, to or cut. My scenario: I simulated densities of a plants originating from an sseed source at distance zero. Therefore the density of the plants will be highest close to zero. Is there anything I can do to have this pattern? If I use 'from' or 'cut', the resulting densities do not integrate to one which I need as I want to compare different density curves. Ny second question is concerning the bandwidth. An I correct in saying that if I want to compare different density estimates that the bandwidth should be the same for all of them? Thanks in advance for your help, Rainer -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel:+27 - (0)72 808 2975 (w) Fax:+27 - (0)21 808 3304 Cell: +27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density Estimation
On Thu, Jun 08, 2006 at 08:31:26PM +0200, Pedro Ramirez wrote: In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one. Thanks a lot for your remark! I was not aware of the fact that the optimal bandwidths for density and distribution do not decrease at the same rate. Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function. The given interval 0x3 was only an example, in fact I would like to estimate the probability for intervals such as 0=x1 , 1=x2 , 2=x3 , 3=x4 , and compare it with the estimates of a corresponding histogram. In this case the stated problem is not anymore equivalent to the estimation of the distribution function. What do you think, can why not? the probabilities you are interested in are of the form F(1)-F(0), F(2)-F(1), and so on where F(.) if the cumulative distribution function (and it must be continuous, since its derivative exists). I go a ahead in this case with the optimal bandwidth for the density? Thanks a lot for your help! no best wishes, Adelchi Best wishes Pedro best wishes, Adelchi PR PR PR -- PR Gregory (Greg) L. Snow Ph.D. PR Statistical Data Center PR Intermountain Healthcare PR [EMAIL PROTECTED] PR (801) 408-8111 PR PR PR -Original Message- PR From: [EMAIL PROTECTED] PR [mailto:[EMAIL PROTECTED] On Behalf Of Pedro PR Ramirez Sent: Wednesday, June 07, 2006 11:00 AM PR To: r-help@stat.math.ethz.ch PR Subject: [R] Density Estimation PR PR Dear R-list, PR PR I have made a simple kernel density estimation by PR PR x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) PR kde - density(x,n=100) PR PR Now I would like to know the estimated probability that a new PR observation falls into the interval 0x3. PR PR How can I integrate over the corresponding interval? PR In several R-packages for kernel density estimation I did not PR found a corresponding function. I could apply Simpson's Rule for PR integrating, but perhaps somebody knows a better solution. PR PR Thanks a lot for help! PR PR Pedro PR PR _ PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR _ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ -- Adelchi Azzalini [EMAIL PROTECTED] Dipart.Scienze Statistiche, Università di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
On Wed, 07 Jun 2006 19:54:32 +0200, Pedro Ramirez wrote: PR Not a direct answer to your question, but if you use a logspline PR density estimate rather than a kernal density estimate then the PR logspline package will help you and it has built in functions for PR dlogspline, qlogspline, and plogspline that do the integrals for PR you. PR PR If you want to stick with the KDE, then you could find the area PR under each of the kernals for the range you are interested in PR (need to work out the standard deviation used from the bandwidth, PR then use pnorm for the default gaussian kernal), then just sum PR the individual areas. PR PR Hope this helps, PR PR Thanks a lot for your quick help! I think I will follow your first PR PR suggestion (logspline PR density estimation) instead of summing over the kernel areas PR because at the boundaries of the range truncated kernel areas can PR occur, so I think it is easier to do it with logsplines. Thanks PR again for your help!! PR PR Pedro PR PR Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function. In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one. best wishes, Adelchi PR PR PR -- PR Gregory (Greg) L. Snow Ph.D. PR Statistical Data Center PR Intermountain Healthcare PR [EMAIL PROTECTED] PR (801) 408-8111 PR PR PR -Original Message- PR From: [EMAIL PROTECTED] PR [mailto:[EMAIL PROTECTED] On Behalf Of Pedro PR Ramirez Sent: Wednesday, June 07, 2006 11:00 AM PR To: r-help@stat.math.ethz.ch PR Subject: [R] Density Estimation PR PR Dear R-list, PR PR I have made a simple kernel density estimation by PR PR x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) PR kde - density(x,n=100) PR PR Now I would like to know the estimated probability that a new PR observation falls into the interval 0x3. PR PR How can I integrate over the corresponding interval? PR In several R-packages for kernel density estimation I did not PR found a corresponding function. I could apply Simpson's Rule for PR integrating, but perhaps somebody knows a better solution. PR PR Thanks a lot for help! PR PR Pedro PR PR _ PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one. Thanks a lot for your remark! I was not aware of the fact that the optimal bandwidths for density and distribution do not decrease at the same rate. Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function. The given interval 0x3 was only an example, in fact I would like to estimate the probability for intervals such as 0=x1 , 1=x2 , 2=x3 , 3=x4 , and compare it with the estimates of a corresponding histogram. In this case the stated problem is not anymore equivalent to the estimation of the distribution function. What do you think, can I go a ahead in this case with the optimal bandwidth for the density? Thanks a lot for your help! Best wishes Pedro best wishes, Adelchi PR PR PR -- PR Gregory (Greg) L. Snow Ph.D. PR Statistical Data Center PR Intermountain Healthcare PR [EMAIL PROTECTED] PR (801) 408-8111 PR PR PR -Original Message- PR From: [EMAIL PROTECTED] PR [mailto:[EMAIL PROTECTED] On Behalf Of Pedro PR Ramirez Sent: Wednesday, June 07, 2006 11:00 AM PR To: r-help@stat.math.ethz.ch PR Subject: [R] Density Estimation PR PR Dear R-list, PR PR I have made a simple kernel density estimation by PR PR x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) PR kde - density(x,n=100) PR PR Now I would like to know the estimated probability that a new PR observation falls into the interval 0x3. PR PR How can I integrate over the corresponding interval? PR In several R-packages for kernel density estimation I did not PR found a corresponding function. I could apply Simpson's Rule for PR integrating, but perhaps somebody knows a better solution. PR PR Thanks a lot for help! PR PR Pedro PR PR _ PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Dear R-list, I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Not a direct answer to your question, but if you use a logspline density estimate rather than a kernal density estimate then the logspline package will help you and it has built in functions for dlogspline, qlogspline, and plogspline that do the integrals for you. If you want to stick with the KDE, then you could find the area under each of the kernals for the range you are interested in (need to work out the standard deviation used from the bandwidth, then use pnorm for the default gaussian kernal), then just sum the individual areas. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Ramirez Sent: Wednesday, June 07, 2006 11:00 AM To: r-help@stat.math.ethz.ch Subject: [R] Density Estimation Dear R-list, I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Pedro wrote: I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. One possibility is to use splinefun(): spiffy - splinefun(kde$x,kde$y) integrate(spiffy,0,3) 0.2353400 with absolute error 2e-09 cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Not a direct answer to your question, but if you use a logspline density estimate rather than a kernal density estimate then the logspline package will help you and it has built in functions for dlogspline, qlogspline, and plogspline that do the integrals for you. If you want to stick with the KDE, then you could find the area under each of the kernals for the range you are interested in (need to work out the standard deviation used from the bandwidth, then use pnorm for the default gaussian kernal), then just sum the individual areas. Hope this helps, Thanks a lot for your quick help! I think I will follow your first suggestion (logspline density estimation) instead of summing over the kernel areas because at the boundaries of the range truncated kernel areas can occur, so I think it is easier to do it with logsplines. Thanks again for your help!! Pedro -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Ramirez Sent: Wednesday, June 07, 2006 11:00 AM To: r-help@stat.math.ethz.ch Subject: [R] Density Estimation Dear R-list, I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Hallo I am trying to use the package LocFit to follow the example given in an Introductory note of C Loader concerning density estimation. It involves the geyser dataset (107 observations on durations, inlc in the package). I have tried the following (using the latest version of R): fit.of - locfit(~geyser,flim=c(1,6),alpha=c(0.15,0.9)) plot(fit.of,get.data=T,mpv=200) This produces a plot (after several warnings). My question is: how can I get the plot to cover the range: 1 - 6 ? for durations. The plot covers the observed data range only. It appears there is a problem with flim=c(1,6) flim is not actually correct, and consequently c(1,6) is not used correctly. I have also tried to use xlim=c(1,6), but without success. I need some help on this please. Thanks Jacob Jacob L van Wyk Department of Statistics University of Johannesburg APK P O Box 524 Auckland Park 2006 South Africa Tel: +27-11-489-3080 Fax: +27-11-489-2832 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density estimation with monotonic constaints
There are multiple functions for density estimation in R, but I don't know of any for estimating a monotonically decreasing density. If you haven't already, I encourage you to use, e.g., the help.search and RSiteSearch functions to find and explore their capabililties. Why do you ask? Are you interested in analyzing particular data set(s) or are you doing research on density estimation? If it were my problem, I might just try something like the function density and then evaluate the results to find out if it satisfied my constraints. If it did and if I were only interested in that data set, I'd be done. If not, I'd increase the smoothing until I got something that was monotonic. If I wanted a more general method, I might wrap a call to a function like density inside another function, and automatically adjust the smoothing until it satisfied some optimality criterion I might devise. If I didn't get what I wanted doing that, I might list, e.g., the density function and walk through it line by line until I figured out what I needed to change to get what I wanted. I just listed density and found that it consists solely of a call to UseMethod. To get beyond that, I tried 'methods(density), which told me there was only one method called density.default. Then requesting density.default gave me the code for that. Another tip: I find debug extrememly helpeful for walking through code like this. I suspect this will not solve your problem, but I hope at least it helps. If you'd like further assistance from this listserve, please submit another post. However, I encourage you first to PLEASE do read the posting guide! www.R-project.org/posting-guide.html. Doing so might increase your chances for getting useful information more quickly. spencer graves Debayan Datta wrote: Hi All, I have a sample x={x1,x2,..,xn} fom a distribution with density f. I wish to estimate the density. I know a priori that the density is monotonically decreasing. Is there a way to do this in R? Thanks Debayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density estimation with monotonic constaints
Hi All, I have a sample x={x1,x2,..,xn} fom a distribution with density f. I wish to estimate the density. I know a priori that the density is monotonically decreasing. Is there a way to do this in R? Thanks Debayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density function
Thank you very much, Professor Ripley. If possible, could you point me to other packages that you think I should look at for estimating a derivative? Best regards, Hui Prof Brian Ripley wrote: On Tue, 10 May 2005, Hui Han wrote: I wonder if the function density outputs the gaussian mixture formula that is estimated from the input data, assuming a gaussian model is used at each data point ? I want to take the derivative of the finally estimated gaussian mixture formula for further analysis. It is a kernel density estimate: a rather trivial mixture, not necessarily Gaussian. Also, it is not set up to optimally estimate a derivative, and you should look at more sophisticated methods in other packages if you want to do that. As to what density outputs: see its help page. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density function
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/20509.html -s. Hui Han wrote: Thank you very much, Professor Ripley. If possible, could you point me to other packages that you think I should look at for estimating a derivative? Best regards, Hui Prof Brian Ripley wrote: On Tue, 10 May 2005, Hui Han wrote: I wonder if the function density outputs the gaussian mixture formula that is estimated from the input data, assuming a gaussian model is used at each data point ? I want to take the derivative of the finally estimated gaussian mixture formula for further analysis. It is a kernel density estimate: a rather trivial mixture, not necessarily Gaussian. Also, it is not set up to optimally estimate a derivative, and you should look at more sophisticated methods in other packages if you want to do that. As to what density outputs: see its help page. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density function
Thank you so much, Suresh. I searched a lot on density among R email archives. Should have searched using derivative. Hui Suresh Krishna wrote: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/20509.html -s. Hui Han wrote: Thank you very much, Professor Ripley. If possible, could you point me to other packages that you think I should look at for estimating a derivative? Best regards, Hui Prof Brian Ripley wrote: On Tue, 10 May 2005, Hui Han wrote: I wonder if the function density outputs the gaussian mixture formula that is estimated from the input data, assuming a gaussian model is used at each data point ? I want to take the derivative of the finally estimated gaussian mixture formula for further analysis. It is a kernel density estimate: a rather trivial mixture, not necessarily Gaussian. Also, it is not set up to optimally estimate a derivative, and you should look at more sophisticated methods in other packages if you want to do that. As to what density outputs: see its help page. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density function
Hi, I wonder if the function density outputs the gaussian mixture formula that is estimated from the input data, assuming a gaussian model is used at each data point ? I want to take the derivative of the finally estimated gaussian mixture formula for further analysis. Thanks in advance for any help that you can offer me! Hui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation
Hi, I have been looking for a method of estimating a parametric model from the output (x, y) from the R function density. Below is my thought and wonder if it looks OK. Suppose that we build a single gaussian model for each input data point x (x is the mean), the overal model may be a sum of these gaussian models built on each x, i.e. P(y) = \sum_x P(y|x, \sigma), where y is any new data point. Is this right? Any normalization is applied? Thanks in advance for any suggestion that you may offer me! Best regards, Hui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density function
On Tue, 10 May 2005, Hui Han wrote: I wonder if the function density outputs the gaussian mixture formula that is estimated from the input data, assuming a gaussian model is used at each data point ? I want to take the derivative of the finally estimated gaussian mixture formula for further analysis. It is a kernel density estimate: a rather trivial mixture, not necessarily Gaussian. Also, it is not set up to optimally estimate a derivative, and you should look at more sophisticated methods in other packages if you want to do that. As to what density outputs: see its help page. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density of the sum of two random variables
Paul Smith wrote: Dear All I would like to know whether it is possible with R to get the mathematical expression of the density of a sum of two independent continuous random variables. No, that corresponds to a convolution of the two densities, and R can't do any symbolic integration. You could get numerical approximations to the density at any point using integrate() (or sum(), if a discrete distribution is involved). Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density of the sum of two random variables
Have you considered package distr? It will do something similar to what you request, I think; it may or may not be adequate for your purposes. spencer graves Duncan Murdoch wrote: Paul Smith wrote: Dear All I would like to know whether it is possible with R to get the mathematical expression of the density of a sum of two independent continuous random variables. No, that corresponds to a convolution of the two densities, and R can't do any symbolic integration. You could get numerical approximations to the density at any point using integrate() (or sum(), if a discrete distribution is involved). Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density of the sum of two random variables
On 5/5/05, Spencer Graves [EMAIL PROTECTED] wrote: Have you considered package distr? It will do something similar to what you request, I think; it may or may not be adequate for your purposes. Thanks, Spencer and Duncan. Maybe, the best choice is to use Maple or MuPAD for that. Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density of the sum of two random variables
Dear All I would like to know whether it is possible with R to get the mathematical expression of the density of a sum of two independent continuous random variables. Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density curve over a histogram
Dear All I would like to draw a picture with the density curve of a normal distribution over a histogram of a set of random numbers extracted from the same normal distribution. Is that possible? Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density curve over a histogram
On Wed, 27 Apr 2005 19:06:07 +0100 Paul Smith wrote: Dear All I would like to draw a picture with the density curve of a normal distribution over a histogram of a set of random numbers extracted from the same normal distribution. Is that possible? To quote Simon `Yoda' Blomberg: This is R. There is no if. Only how. (see fortune(Yoda)) Try: R x - rnorm(100) R hist(x, freq = FALSE) R curve(dnorm, col = 2, add = TRUE) Z Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density curve over a histogram
Paul Smith wrote: Dear All I would like to draw a picture with the density curve of a normal distribution over a histogram of a set of random numbers extracted from the same normal distribution. Is that possible? Yes. If you like to know how, see e.g. ?hist and ?curve. Uew Ligges Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density curve over a histogram
Le 27 Avril 2005 14:06, Paul Smith a écrit : I would like to draw a picture with the density curve of a normal distribution over a histogram of a set of random numbers extracted from the same normal distribution. Is that possible? Sure. See curve() with add=TRUE. Don't forget to use prob=TRUE when plotting your histogram, though. Vincent __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density curve over a histogram
On 4/27/05, Achim Zeileis [EMAIL PROTECTED] wrote: I would like to draw a picture with the density curve of a normal distribution over a histogram of a set of random numbers extracted from the same normal distribution. Is that possible? To quote Simon `Yoda' Blomberg: This is R. There is no if. Only how. (see fortune(Yoda)) Try: R x - rnorm(100) R hist(x, freq = FALSE) R curve(dnorm, col = 2, add = TRUE) Fantastic! Thanks a lot, Achim. Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density curve over a histogram
Paul Smith [EMAIL PROTECTED] writes: Dear All I would like to draw a picture with the density curve of a normal distribution over a histogram of a set of random numbers extracted from the same normal distribution. Is that possible? Yes. If you look at the scripts that go with the ISwR package, you'll find a detailed example in ch01.R (end of 1.3/beginning of 1.4). Or you could read the book, of course... -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density
Hui Han wrote: Hi, I used the density function in the R package, and got the following results. I just wonder how to explain them. What is Min, 1st Qu, Median, and so on? I could not find an explanation from help(density). The plot doesn't seem to match the x and y value either. Thanks in advance for any help that you can give me! Hui Call: density(x = x2, kernel = gaussian) Data: x2 (6437 obs.); Bandwidth 'bw' = 0.1209 x yMin. :-1.8856 Min. :5.851e-06 1st Qu.:-0.1629 1st Qu.:2.262e-03 Median : 1.5599 Median :3.945e-02 Mean : 1.5599 Mean :1.450e-01 3rd Qu.: 3.2826 3rd Qu.:2.738e-01 Max. : 5.0054 Max. :5.761e-01 density() estimates the density (y) at several values (x). The values above are the summaries (see ?summary) for those x and y values calculated by print.density() ... Uwe Ligges __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation
hello sorry for my english I would like estimate density for multivariate variable,( f(x,y) , f(x,y ,z) for example) ; for calculate mutual information how is posible with R? thanks Bernard Bernard Palagos Unité Mixte de Recherche Cemagref - Agro.M - CIRAD Information et Technologie pour les Agro-Procédés Cemagref - BP 5095 34033 MONTPELLIER Cedex 1 France http://www.montpellier.cemagref.fr/teap/default.htm Tel: 04 67 04 63 13 Fax: 04 67 04 37 82 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density
Hi, I used the density function in the R package, and got the following results. I just wonder how to explain them. What is Min, 1st Qu, Median, and so on? I could not find an explanation from help(density). The plot doesn't seem to match the x and y value either. Thanks in advance for any help that you can give me! Hui Call: density(x = x2, kernel = gaussian) Data: x2 (6437 obs.); Bandwidth 'bw' = 0.1209 x y Min. :-1.8856 Min. :5.851e-06 1st Qu.:-0.1629 1st Qu.:2.262e-03 Median : 1.5599 Median :3.945e-02 Mean : 1.5599 Mean :1.450e-01 3rd Qu.: 3.2826 3rd Qu.:2.738e-01 Max. : 5.0054 Max. :5.761e-01 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation with weighted sample
Dear all I would like to perform density estimation with a weighted sample (output of an Importance Sampling procedure) in R. Could anybody give me an advice on what function to use (in which package)? Thanks a lot, Lorenzo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density estimation with weighted sample
On Thu, 7 Apr 2005, Tomassini, Lorenzo wrote: I would like to perform density estimation with a weighted sample (output of an Importance Sampling procedure) in R. Could anybody give me an advice on what function to use (in which package)? This could mean 1) You have a sample with weights w, so `w=4' means `I have 4 of those'. 2) You have a sample from a density proportional to w(x)f(x) and want to estimate f. Your title suggests the first, your comment the second. If it is the second, use any package (even density() in R) to estimate the density g of the sampled distribution, for ghat/w and rescale to unit area. If you know a lot about w (e.g. in stereology) there are specialized methods which are better. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density of the Multivariate T Distribution
Hi, I am looking for an efficient way to compute the values of the density function of a multivariate T distribution - something like dmvnorm, but for T distr. Does this exist somewhere? Many thanks, Jan Bulla Goettingen University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density of the Multivariate T Distribution
Jan Bulla wrote: I am looking for an efficient way to compute the values of the density function of a multivariate T distribution - something like dmvnorm, but for T distr. Does this exist somewhere? Searching CRAN I found the ``sn'' package which includes the function dmst() which calculates the density for ***skewed*** multivariate t distributions. I conjecture that setting the skewness parameters ``alpha'' equal to 0 would give you the ``ordinary'' multivariate t distribution. I haven't tried this out. It puzzles me why the mvtnorm package includes functions pmvt(), qmvt(), and rmvt() but ***not*** dmvt(). Why on earth not? cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for
On 13-Nov-04 bogdan romocea wrote: Dear R users, However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? What you're missing is the dx! A density estimation estimates the probability density function g(x) such that int[g(x)*dx] = 1, and R's 'density' function returns estimated values of g at a discrete set of points. An integral can be approximated by a discrete summation of the form sum(g(x.i)*delta.x You can recover the set of x-values at which the density is estimated, and hence the implicit value of delta.x, from the returned density. Example: X-rnorm(1000) f-density(X) x-f$x delta.x-x[2]-x[1] g-f$y sum(g*delta.x) [1] 1.000976 Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 14-Nov-04 Time: 08:50:53 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for given distribution
First thing you probably should realize is that density is _not_ probability. A probability density function _integrates_ to one, not _sum_ to one. If X is an absolutely continuous RV with density f, then Pr(X=x)=0 for all x, and Pr(a X b) = \int_a^b f(x) dx. sum x*Pr(X=x) (over all possible values of x) for a discrete distribution is just the expectation, or mean, of the distribution. The expectation for a continuous distribution is \int x f(x) dx, where the integral is over the support of f. This is all elementary math stat that you can find in any textbook. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy From: bogdan romocea Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for given distribution
Andy, Thanks a lot for the clarifications. I was running a simulation a number of times and trying to come up with a number to summarize the results. And, I failed to realize from the beginning that what I was trying to compute was just the mean. Regards, b. --- Liaw, Andy [EMAIL PROTECTED] wrote: First thing you probably should realize is that density is _not_ probability. A probability density function _integrates_ to one, not _sum_ to one. If X is an absolutely continuous RV with density f, then Pr(X=x)=0 for all x, and Pr(a X b) = \int_a^b f(x) dx. sum x*Pr(X=x) (over all possible values of x) for a discrete distribution is just the expectation, or mean, of the distribution. The expectation for a continuous distribution is \int x f(x) dx, where the integral is over the support of f. This is all elementary math stat that you can find in any textbook. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy From: bogdan romocea Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density estimation: compute sum(value * probability) for given distribution
bogdan romocea wrote: Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? I don't get the point. You are estimating using a gaussian kernel. Hint: What's the probability to get x=0 for a N(0,1) distribution? So sum(values*probabilities) is zero! The probabilities produced by the density function sum to only 26%: and could also sum to, e.g., 783453.9, depending on the number of observations and the estimated parameters of the desnity ... sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? No. den$x is a point where the density function is equal to den$y, but den$y is not the probability to get den$x (you know, the stuff with intervals)! I fear you are mixing theory from discrete with continuous distributions. Uwe Ligges Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation: compute sum(value * probability) for given distribution
Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Dear Brian, I can suggest you to use density() function to get an estimate of the pdf you're finding (I believe it's unknown). Then you can plot the point you got by density() using plot(). In this way you have a graphic representation of you unknown pdf. According its shape and helping by the graphic you could try to understand what kind of pdf it would be (normal, gamma, weibul, etc.) After you can estimate parameters of pdf using your data with LS or ML methods. Then you can calculate the goodness of fit for each model of pdf and use the best one. I hope I get you a little help. Cordially Vito Ricci [EMAIL PROTECTED] wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml ___ http://it.seriea.fantasysports.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Try fitting it with a Johnson function -- see SuppDists. If you can fit it you will then be able to use the functions in SuppDists just as you can for any other distribution supported by R. Brian Mac Namee wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Bob Wheeler --- http://www.bobwheeler.com/ ECHIP, Inc. --- Randomness comes in bunches. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Hi! The function density returns you a object of class density. This object has an x and an y attribute which you can access by x y, Hi! Use approx and runif. eg.: dd-density(rnorm(100,3,5)) plot(dd) Using the function ?approx you can compute the density value for any x. #the x is a dummy here. mydist-function(x,dd) { while(1) { tmp - runif(1,min=min(dd$x),max=max(dd$x)) lev - approx(dd$x,dd$y,tmp)$y if(runif(1,c(0,1)) = lev) { return(tmp) } } } x - 0 mydist(x,dd) res-rep(0,500) res-sapply(res,mydist,dd) lines(density(res),col=2) /E. *** REPLY SEPARATOR *** On 9/15/2004 at 12:36 PM Brian Mac Namee wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dipl. bio-chem. Witold Eryk Wolski @ MPI-Moleculare Genetic Ihnestrasse 63-73 14195 Berlin'v' tel: 0049-30-83875219/ \ mail: [EMAIL PROTECTED]---W-Whttp://www.molgen.mpg.de/~wolski [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Density Estimation
On 15-Sep-04 Brian Mac Namee wrote: Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? It's not clear what you're really after, but it looks as though you may be wanting to sample from the distribution estimated by 'density'. A possible approach, which you could refine, is exemplified by x-rnorm(1000) d-density(x,n=4096) y-sample(d$x,size=1000,prob=d$y) Check performance with hist(y) Looks OK to me! See ?density and ?sample. On an alternative interpretation, perhaps you want to first estimate the density based on data you already have, and then when you have got further data (but these would then be seen and not unseen) come to a judgement about whether these new points are compatible with coming from the distributikon you have estimated. A possible approach to this question (again susceptible to refinement) would be as follows. 1. Use a fine-grained grid for 'density', i.e. a large value for n. 2. Replace each of the points in the new data by the nearest point in this grid. Call these values z1, z2, ... , zk corresponding to index values i1, i2, ... , ik in d$x. 3. Evaluate the probability P(z1,...,zk) from the density as the product of d$y[i] where i-c(i1,...,ik). Better still, evaluated the logarithm of this. Call the result L. 4. Now simulate a large number of draws of k values from d on the lines of sample(d$x,size=k,prob=d$y) as above, and evaluate L for each of these. Where is the value of L from (3) situated in the distribution of these values of L from (4)? If (say) only 1 per cent of the simulated values of L from d are less than the value of L from (3), then you have a basis for a test that your new data did not come from the distribution you have estimated from your old data, in that the new data are from the low-density part of the estimated distribution. There are of course alternative ways to view this question. The value of k is relevant. In particular, if k is small (say 3 or 4) then the suggestion in (4) is probably the best way to approach it. However, if k is large then you can use a test on the lines of Kolmogorov-Smirnov with the reference distribution estimated as the cumulative distribution of d$y and the distribution being tested as the empirical cumulative distribution of your new data. Even sharper focus is available if you are in a position to make a paramatric model for your data, but your description does not suggest that this is the case. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 15-Sep-04 Time: 15:07:33 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density(x)
Dear experts, when trying to estimate an kernel density function with density(x) I get the following error message with imported data from either EXCEL or text files: Error in density(spr) : argument must be numeric. Other procedues such as truehist work. If I generate data within R density works fine. Does anybody have an idea? Yours -- Christoph Hanck Wissenschaftliche Hilfskraft Lehrstuhl für Empirische Wirtschaftsforschung, Prof. Dr. Wilfling http://www.wiwi.uni-muenster.de/~05/ WWU Muenster Tel.: +49-251-83 25043 eMail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
Hello! On Mon, 2004-07-05 at 15:34, Christoph Hanck wrote: Dear experts, when trying to estimate an kernel density function with density(x) I get the following error message with imported data from either EXCEL or text files: Error in density(spr) : argument must be numeric. Well, as R tells you: You should check, whether your data is of type numeric. Depending on the way you import the data spr this may not be the case and you have to do density(as.numeric(spr)) which should work... Besides: please read the guidelines for posting (see http://www.R-project.org/posting-guide.html) giving some details on the procedure you use to read in the data may have helped to give you a precise answer! Regards, Winfried -- - Dr. Dipl.-Math. Winfried Theis, SFB 475, Projekt C5, Universität Dortmund, 44221 Dortmund e-mail: [EMAIL PROTECTED] Tel.: +49/231/755-5903 FAX: +49/231/755-4387 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
On Mon, 2004-07-05 at 08:34, Christoph Hanck wrote: Dear experts, when trying to estimate an kernel density function with density(x) I get the following error message with imported data from either EXCEL or text files: Error in density(spr) : argument must be numeric. Other procedues such as truehist work. If I generate data within R density works fine. Does anybody have an idea? More than likely, your vector 'spr' was imported as a factor. This would possibly suggest that at least one value in 'spr' is not numeric. If the entire vector was numeric, this would not be a problem. It is also possible that you may have not specified the proper delimiting character during the import, which would compromise the parsed structure of the incoming data. Use: str(spr) and you will probably get Factor ... First, check to be sure that you have used the proper delimiting character during your import. See ?read.table for the family of related functions and the default argument values for 'sep', which is the delimiting character. You should also check your source data file, since it may be problematic. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
Hello and thanks for your reply Hopefully, my answer arrives at the correct place like that (if not, I am sorry for bothering you, but please let me know...) To sum up my procedure (sp is exactly the same thing as spr, I had just tinkered with the names while trying sth. to solve this problem) sp-read.table(c:/ratsdata/sp3.txt, col.names=sp) xd-density(sp) Error in density(sp) : argument must be numeric The suggested remedies yield the following str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 11 10 10 12 25 22 12 23 13 15 ... xd-density(as.numeric(sp)) Error in as.double.default(sp) : (list) object cannot be coerced to double Hence, it does not seem to be a factor. Declaring it as numeric gives another error message, on which I haven't yet found any help in Google/the archive. Yours sincerely -- Christoph Hanck Wissenschaftliche Hilfskraft Lehrstuhl für Empirische Wirtschaftsforschung, Prof. Dr. Wilfling http://www.wiwi.uni-muenster.de/~05/ WWU Muenster Tel.: +49-251-83 25043 -- Christoph Hanck Wissenschaftliche Hilfskraft Lehrstuhl für Empirische Wirtschaftsforschung, Prof. Dr. Wilfling http://www.wiwi.uni-muenster.de/~05/ WWU Muenster Tel.: +49-251-83 25043 eMail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
OK, so sp is a data frame. Probably you want density(sp$sp) there since the single column is already numeric. It just so happens that truehist does an implicit drop() on a 1-column data frame. On Mon, 5 Jul 2004, Christoph Hanck wrote: Hello and thanks for your reply Hopefully, my answer arrives at the correct place like that (if not, I am sorry for bothering you, but please let me know...) To sum up my procedure (sp is exactly the same thing as spr, I had just tinkered with the names while trying sth. to solve this problem) sp-read.table(c:/ratsdata/sp3.txt, col.names=sp) xd-density(sp) Error in density(sp) : argument must be numeric The suggested remedies yield the following str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 11 10 10 12 25 22 12 23 13 15 ... xd-density(as.numeric(sp)) Error in as.double.default(sp) : (list) object cannot be coerced to double Hence, it does not seem to be a factor. Declaring it as numeric gives another error message, on which I haven't yet found any help in Google/the archive. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
Hello, OK, so sp is a data frame. Probably you want density(sp$sp) there since the single column is already numeric. Yes, that works just the way I hoped. So what I am essentially doing is selecting (just to know what I'm doing) the column that contains sp from the data frame sp? Thank you very much! Christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
Christoph Hanck wrote: Hello and thanks for your reply Hopefully, my answer arrives at the correct place like that (if not, I am sorry for bothering you, but please let me know...) To sum up my procedure (sp is exactly the same thing as spr, I had just tinkered with the names while trying sth. to solve this problem) sp-read.table(c:/ratsdata/sp3.txt, col.names=sp) xd-density(sp) Error in density(sp) : argument must be numeric The suggested remedies yield the following str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 11 10 10 12 25 22 12 23 13 15 ... xd-density(as.numeric(sp)) Error in as.double.default(sp) : (list) object cannot be coerced to double It is telling you that it cannot convert a list into a numeric object. A data frame is a list so it is telling you that you cannot convert the data frame into a numeric vector. Hence, it does not seem to be a factor. Declaring it as numeric gives another error message, on which I haven't yet found any help in Google/the archive. You want the sp column of the data frame sp not the data frame sp itself (perhaps you should choose a name for the data frame that is different to a column name) sp - data.frame(sp = rnorm(100)) density(sp) Error in density(sp) : argument must be numeric density(sp$sp) Call: density(x = sp$sp) Data: sp$sp (100 obs.); Bandwidth 'bw' = 0.3007 x y Min. :-3.37457 Min. :0.0001983 1st Qu.:-1.73138 1st Qu.:0.0389884 Median :-0.08819 Median :0.1157180 Mean :-0.08819 Mean :0.1519886 3rd Qu.: 1.55500 3rd Qu.:0.2227940 Max. : 3.19818 Max. :0.4766640 Does this help? with(sp, density(sp)) would also do what you want, see ?with, and there are other ways. Gavin -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] [EMAIL PROTECTED] UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
On Mon, 2004-07-05 at 09:41, Christoph Hanck wrote: Hello and thanks for your reply Hopefully, my answer arrives at the correct place like that (if not, I am sorry for bothering you, but please let me know...) To sum up my procedure (sp is exactly the same thing as spr, I had just tinkered with the names while trying sth. to solve this problem) sp-read.table(c:/ratsdata/sp3.txt, col.names=sp) xd-density(sp) Error in density(sp) : argument must be numeric The suggested remedies yield the following str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 11 10 10 12 25 22 12 23 13 15 ... xd-density(as.numeric(sp)) Error in as.double.default(sp) : (list) object cannot be coerced to double Hence, it does not seem to be a factor. Declaring it as numeric gives another error message, on which I haven't yet found any help in Google/the archive. In this case, you are trying to pass a data frame as an argument to density() rather than a single column vector. The same problem is the reason for the error in xd-density(as.numeric(sp)). You are trying to coerce a data frame to a double. Example: # create a data frame called 'sp', that has a column called 'sp' sp - data.frame(sp = 1:195) str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 1 2 3 4 5 6 7 8 9 10 ... # Now try to use density() density(sp) Error in density(sp) : argument must be numeric # Now call density() properly with the column 'sp' as an argument # using the data.frame$column notation: density(sp$sp) Call: density(x = sp$sp) Data: sp$sp (195 obs.); Bandwidth 'bw' = 17.69 xy Min. :-52.08 Min. :7.688e-06 1st Qu.: 22.96 1st Qu.:1.009e-03 Median : 98.00 Median :4.600e-03 Mean : 98.00 Mean :3.328e-03 3rd Qu.:173.04 3rd Qu.:5.131e-03 Max. :248.08 Max. :5.133e-03 Two other options in this case: 1. Use attach() to place the data frame 'sp' in the current search path. Now you do not need to explicitly use the data.frame$column notation. Then detach is then used to clean up. attach(sp) density(sp) detach(sp) 2. Use with(), which is the preferred notation when dealing with data frames: with(sp, density(sp)) To avoid your own confusion in the future, it would be better to not name the data frame with the same name as a vector. It also helps when others may need to review your code. See ?with and ?attach for more information. Reading through An Introduction to R which is part of the default documentation set would be helpful to you in better understanding data types and dealing with data frame structures. I see that Prof. Ripley has also replied regarding the nature of truehist(), so that helps to clear up that mystery :-) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
Hello, thanks again. Reading through An Introduction to R which is part of the default documentation set would be helpful to you in better understanding data types and dealing with data frame structures. I got the message! I admit that my systematic efforts into R may be considered wanting. -- Christoph Hanck Wissenschaftliche Hilfskraft Lehrstuhl für Empirische Wirtschaftsforschung, Prof. Dr. Wilfling http://www.wiwi.uni-muenster.de/~05/ WWU Muenster Tel.: +49-251-83 25043 eMail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Density Estimation
help.search(kernel density) reports KernSec(GenKern)Univariate kernel density estimate KernSur(GenKern)Bivariate kernel density estimation bkde(KernSmooth)Compute a Binned Kernel Density Estimate bkde2D(KernSmooth) Compute a 2D Binned Kernel Density Estimate dpik(KernSmooth)Select a Bandwidth for Kernel Density Estimation kde2d(MASS) Two-Dimensional Kernel Density Estimation amongst others, and package sm also has a user-friendly selection. So, apart from point out alternatives I wanted to point out how easy it was to find the information originally requested. On Sat, 10 Apr 2004, Ko-Kang Kevin Wang wrote: -Original Message- From: [EMAIL PROTECTED] Dear Sir/Madam; Would you please tell me what is the command that allows the estimation of the Kernel Density for some data. Thanks, ?density -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Dear Sir/Madam; Would you please tell me what is the command that allows the estimation of the Kernel Density for some data. Thanks, Thami Rachidi [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Density Estimation
-Original Message- From: [EMAIL PROTECTED] Dear Sir/Madam; Would you please tell me what is the command that allows the estimation of the Kernel Density for some data. Thanks, ?density __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Plots
I am using an older version of R (1.6.2) to run a Monte Carlo simulation, generating 10,000 samples per 'run'. When I plot histograms I get the expected 'bins' on the x-axis and the frequency distribution on the y-axis. However when I ask R to plot the SAME data set with a density curve the x-axis remains the same but the y-axis can generate values of up to 1e8 etc. Can anyone (a) explain why this might be so and/or (b) suggest a fix? Many thanks David Tyler [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Density Plots
David Tyler wrote (using an e-mail client that doesn't wrap lines): I am using an older version of R (1.6.2) to run a Monte Carlo simulation, generating 10,000 samples per 'run'. When I plot histograms I get the expected 'bins' on the x-axis and the frequency distribution on the y-axis. However when I ask R to plot the SAME data set with a density curve the x-axis emains the same but the y-axis can generate values of up to 1e8 etc. Can anyone (a) explain why this might be so and/or (b) suggest a fix? try hist(..., freq=FALSE) This should give the same numbers as the density plots' y-axes. It sounds like you've got a narrow range of x-axis values (small numbers, or small differences between them, or both). The total area under a density estimate curve must equal 1 by definition, so nothing's really broken. The only fix is to re-scale the x axis to different units, or draw a different y-axis on after the fact. Something like... foo - density(...) plot(foo, yaxt=n) axis(...) # something that means something to you here. Since this isn't a density plot any longer, it would help to be clear to your readers what's going on with the plots. Hope that helps Cheers Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] density plot for very large dataset
Have you tried the 'sm.density' function from the sm library? I used it for a dataset which 'only' had 13 points. I'm new to R and am trying to perform a simple, yet problematic task. I have two variables for which I would like to measure the correlation and plot versus each other. However, I have ~30 million data points measurements of each variable. I can read this into R from file and produce a plot with plot(x0, x1) but as you would expect, its not pretty to look at and produces a postscript file of about 700MB. Christophe Pallier http://www.pallier.org __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] density plot for very large dataset
You might want to try hexbin (hexagonal binning) in the BioConductor suite (see www.bioconductor.org). HTH, Andy From: Obi Griffith I'm new to R and am trying to perform a simple, yet problematic task. I have two variables for which I would like to measure the correlation and plot versus each other. However, I have ~30 million data points measurements of each variable. I can read this into R from file and produce a plot with plot(x0, x1) but as you would expect, its not pretty to look at and produces a postscript file of about 700MB. A google search found a few mentions of doing density plots but they seemed to assume you already have the density matrix. Can anyone point me in the right direction, keeping in mind that I am a complete R newbie. Obi -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] density plot for very large dataset
I'm new to R and am trying to perform a simple, yet problematic task. I have two variables for which I would like to measure the correlation and plot versus each other. However, I have ~30 million data points measurements of each variable. I can read this into R from file and produce a plot with plot(x0, x1) but as you would expect, its not pretty to look at and produces a postscript file of about 700MB. A google search found a few mentions of doing density plots but they seemed to assume you already have the density matrix. Can anyone point me in the right direction, keeping in mind that I am a complete R newbie. Obi __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] density() integrates to 1?
We can try a to approximate the area under the curve using Trapezoidal rule on the plotting coordinates that density() produces. nbin - 1024 # number of bin d - density( rnorm(5), n=nbin) totalArea - 0 for(i in 1:(nbin-1) ){ xxx - d$x[i+1] - d$x[i] # width of bin yyy - (d$y[i+1] + d$y[i])/2 # average height of bin binArea - xxx*yyy totalArea - totalArea + binArea } print(totalArea) We can see that the total area under the curve is close to 1 and the approximation gets better as nbin is increased (but this is always an overestimate due to the concavity of the normal curve). __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] density() integrates to 1?
There was a related thread on R-help, probably last year. The question was getting density() to numerically integrate to 1. The answer is, yes. If you do fine enough partitions, you will see that it integrates to one. And yes, a kernel density estimate is theoretically a true density (assuming the kernel used is a pdf), because it is just a n-component mixture of the kernel. Andy -Original Message- From: Ross Boylan [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 24, 2003 5:36 PM To: r-help Subject: [R] density() integrates to 1? Visual inspection of the plot of a density() function vs a normal with the same mean and variance suggests the area under the density curve is bigger than under the normal curve. The two curves are very close over most of the domain. Assuming the normal curve does integrate to 1, this implies the area under density() is 1. Is there any assurance that the density kernel smoother produces something that integrates to 1? Or am I seeing things? I suppose an additional complexity is that density() produces discrete output, but then I'm looking at the continuous curve plot produced. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo /r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] density() integrates to 1?
On Wed, 2003-09-24 at 18:36, Liaw, Andy wrote: There was a related thread on R-help, probably last year. The question was getting density() to numerically integrate to 1. The answer is, yes. If you do fine enough partitions, you will see that it integrates to one. And yes, a kernel density estimate is theoretically a true density (assuming the kernel used is a pdf), because it is just a n-component mixture of the kernel. Andy With this advice, and on reinspection, I think it's possible I was fooled in my visual integration. There is an area where the density() is under the normal. Vertically, it's actually quite a bit under, even though the two curves are horizontally very close. So perhaps that area is bigger than I thought, enough to account for the discrepancy. The other possibility is that even though the points on density are OK, the curve created by plot putting a line through them really is not OK (in the sense of integrating to 1). The issue for this is not the behavior of density when one increases the number of partitions, but the behavior at a fixed partition (the default 512 in my case). Or rather, that behavior plus that of plot's line. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] density(): obtaining p-values
Dear R-List-Member, is there a more elegant way to obtain p-values of a vector x, whose empirical density has been estimated with density(), than summing up the rectangles as an approximation of the area beneath the empirical distribution function and interpolating the values of x by using approx()? pval.emp - function(x) { df - density(x,from=min(x),to=max(x),kernel=gaussian) width - df$x[2]-df$x[1] rect - df$y*width cdf.emp - cumsum(rect) approx(df$x,cdf.emp,x)$y } Many thks in advance, Bernhard -- If you have received this e-mail in error or wish to read our e-mail disclaimer statement and monitoring policy, please refer to http://www.drkw.com/disc/email/ or contact the sender. -- __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help