Re: [R] Probability weights with density estimation
Charles C. Berry [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: On Wed, 16 Jan 2008, David Winsemius wrote: I am a physician examining an NHANES dataset available at the NCHS website: http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/demo_d.xpt snip TC.ran - exp(rnorm(400,1.5,.3)) HDL.ran - exp(rnorm(400,.4,.3) ) f1-kde2d(HDL.ran,TC.ran,n=25,lims=c(0,4,2,10)) contour(f1$x,f1$y,f1$z,ylim=c(0,8),xlim=c(0,3),ylab=TC mmol/L, xlab=HDL mmol/L) lines(f1$x,5*f1$x) # iso-ratio lines lines(f1$x,4*f1$x) lines(f1$x,3*f1$x) Two questions: Is there a 2d density estimation function that has provision for probability weights (or inverse sampling probabilities)? snip It looks like you can use bkde2D from the KernSmooth package. You might look at the function sqlocpoly in surveyNG which uses the KernSmooth package for details. The prospect of setting up an SQL database was rather daunting and I continued my search. There were references in the the sql.. functions' documentation that they were providing the functions in package Locfit. Finding locfit() provided the weighting options I needed. This is what I came up with: tc.hdl.fit - with(small.nh.chol, locfit(~LBDHDDSI+LBDTCSI, weights=WTMEC2YR, xlim=c(0,0,4,10) ) ) plot(tc.hdl.fit)#give warnings but does work title(main=Weighted, xlab=HDL, ylab=TC) # add labels _after_ plotting. # never could figure out how to get plot() to accept xlab or ylab # when passing the locfit object to it. with(tc.hdl.fit, lines(x,x*4)) -- Thanks; and thank you, Andy Liaw, for helpful earlier posts; David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probability weights with density estimation
On Wed, 16 Jan 2008, David Winsemius wrote: I am a physician examining an NHANES dataset available at the NCHS website: http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/demo_d.xpt http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/hdl_d.xpt http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/tchol_d.xpt Thank you to the R authors and the foreign package authors in particular. Importing from the SAS export fomat file was a snap. It consists of demographic data linked to laboratory measurements. Each subject has an associated sampling weight. I have gotten informative displays following the examples using kde2d() in VR MASSe2 (more thanks), but these were unweighted analyses. The ratio of total cholesterol (TC) to HDL cholesterol is used clinically to estimate risk of future heart disease, and I am looking at how such ratios divide or intersect with the TC x HDL-C distribution. Rather than include all the real data, let me just post a simulation that shows a contourplot reasonably similar to what I am seeing. TC.ran - exp(rnorm(400,1.5,.3)) HDL.ran - exp(rnorm(400,.4,.3) ) f1-kde2d(HDL.ran,TC.ran,n=25,lims=c(0,4,2,10)) contour(f1$x,f1$y,f1$z,ylim=c(0,8),xlim=c(0,3),ylab=TC mmol/L, xlab=HDL mmol/L) lines(f1$x,5*f1$x) # iso-ratio lines lines(f1$x,4*f1$x) lines(f1$x,3*f1$x) Two questions: Is there a 2d density estimation function that has provision for probability weights (or inverse sampling probabilities)? I seem to remember a discussion on the list about whether such a procedure would be meaningful, but my searches cannot locate that thread or any worked examples that incorporate sampling weights. It looks like you can use bkde2D from the KernSmooth package. You might look at the function sqlocpoly in surveyNG which uses the KernSmooth package for details. If there is such a function, would it be a simple matter to calculate the proportion of the total population that would be expected to have a ratio of y.ran/x.ran of less than a particular number, say 4.0? Maybe my eyesight is failing, but I did not see where you define 'y.ran' and 'x.ran'. If they, like 'TC.ran' and 'HDL.ran', are just variables that are dierctly measured in your survey, then estimating the proportion less than a given value for y.ran/x.ran is standard survey sampling fare and no density estimation is needed. In which case, the 'survey' package at CRAN is what you want. HTH, Chuck -- Respectfully; David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Probability weights with density estimation
I am a physician examining an NHANES dataset available at the NCHS website: http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/demo_d.xpt http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/hdl_d.xpt http://www.cdc.gov/nchs/about/major/nhanes/nhanes2005-2006/tchol_d.xpt Thank you to the R authors and the foreign package authors in particular. Importing from the SAS export fomat file was a snap. It consists of demographic data linked to laboratory measurements. Each subject has an associated sampling weight. I have gotten informative displays following the examples using kde2d() in VR MASSe2 (more thanks), but these were unweighted analyses. The ratio of total cholesterol (TC) to HDL cholesterol is used clinically to estimate risk of future heart disease, and I am looking at how such ratios divide or intersect with the TC x HDL-C distribution. Rather than include all the real data, let me just post a simulation that shows a contourplot reasonably similar to what I am seeing. TC.ran - exp(rnorm(400,1.5,.3)) HDL.ran - exp(rnorm(400,.4,.3) ) f1-kde2d(HDL.ran,TC.ran,n=25,lims=c(0,4,2,10)) contour(f1$x,f1$y,f1$z,ylim=c(0,8),xlim=c(0,3),ylab=TC mmol/L, xlab=HDL mmol/L) lines(f1$x,5*f1$x) # iso-ratio lines lines(f1$x,4*f1$x) lines(f1$x,3*f1$x) Two questions: Is there a 2d density estimation function that has provision for probability weights (or inverse sampling probabilities)? I seem to remember a discussion on the list about whether such a procedure would be meaningful, but my searches cannot locate that thread or any worked examples that incorporate sampling weights. If there is such a function, would it be a simple matter to calculate the proportion of the total population that would be expected to have a ratio of y.ran/x.ran of less than a particular number, say 4.0? -- Respectfully; David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.