Re: [R] Imputing data below detection limit
The below detection limit issue is similar to survival analysis with censoring (but left rather than right censoring). So many survival estimation approaches are thus appropriate for analyses with below detection limits (see NADA package, also censored quantile regression in quantreg package, etc). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Bert Gunter To: Jessica Streicher Cc: r-help@r-project.org Date: 08/13/2012 09:28 AM Subject: Re: [R] Imputing data below detection limit Sent by: r-help-boun...@r-project.org Yes, Jessica, the practice -- of which I also have been and continue to be guilty -- does not really make a lot of sense. It usually doesn't affect estimation all that much, but it can certainly mess up inference. The proper approach is to use the proper approach: model it as left-censored data. The problem with that is: 1. It's complicated, and is beyond the statistical background of most folks who deal with such data -- it's a ubiquitous issue in science and engineering after all. 2. Typically, the LOD isn't: that is, there often is not a well defined value and that which is chosen is both arbitrary and inaccurate. What one often sees is an increasing loss of relative precision as one "approaches" the designated value. Modeling this effectively gets even more complicated. David Rocke and colleagues has published methodology on this, mostly in TECHNOMETRICS if memory serves. 3. So, as in other situations, we muddle along with rather crude statistical approaches and hope that they are adequate. Probably in most circumstances they are, but ... Cheers, Bert On Mon, Aug 13, 2012 at 1:15 AM, Jessica Streicher wrote: > Tempting a use of let me google that for you.. > > Anyway, theres a package called Imputation. I myself used the zoo package. There are probably lots of others since its a real common problem. > > They usually fill in places in you data that are designated as NA. > > I do not completely understand what you mean with detection limit. If you do not have NAs, but rather some kind of threshold, i'd suggest going over the data and filling any applicable values with NAs, then use the library of your choice. I find that kind of weird though, if you haven't detected much you haven't detected much. Its part of the data, why impute? > > On 11.08.2012, at 23:01, aynumazi wrote: > >> Hello, >> >> I'm trying to impute data below detection limit (with multiple detection >> limits) >> so i need just a method or a code for imputation and then extract the >> complete dataset to do the analyses. >> Is there any package which could do that simply as i'm a beginner in R >> >> Thank you >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Imputing-data-below-detection-limit-tp4640057.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data below detection limit
On Mon, 13 Aug 2012, Bert Gunter wrote: The proper approach is to use the proper approach: model it as left-censored data. The problem with that is: I'm trying to impute data below detection limit (with multiple detection limits) so i need just a method or a code for imputation and then extract the complete dataset to do the analyses. Is there any package which could do that simply as i'm a beginner in R This is the purpose of the NADA package. The package is based on Dennis Helsel's "Statistics for Censored Environmental Data Using Minitab and R, Second Edition." Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data below detection limit
Yes, Jessica, the practice -- of which I also have been and continue to be guilty -- does not really make a lot of sense. It usually doesn't affect estimation all that much, but it can certainly mess up inference. The proper approach is to use the proper approach: model it as left-censored data. The problem with that is: 1. It's complicated, and is beyond the statistical background of most folks who deal with such data -- it's a ubiquitous issue in science and engineering after all. 2. Typically, the LOD isn't: that is, there often is not a well defined value and that which is chosen is both arbitrary and inaccurate. What one often sees is an increasing loss of relative precision as one "approaches" the designated value. Modeling this effectively gets even more complicated. David Rocke and colleagues has published methodology on this, mostly in TECHNOMETRICS if memory serves. 3. So, as in other situations, we muddle along with rather crude statistical approaches and hope that they are adequate. Probably in most circumstances they are, but ... Cheers, Bert On Mon, Aug 13, 2012 at 1:15 AM, Jessica Streicher wrote: > Tempting a use of let me google that for you.. > > Anyway, theres a package called Imputation. I myself used the zoo package. > There are probably lots of others since its a real common problem. > > They usually fill in places in you data that are designated as NA. > > I do not completely understand what you mean with detection limit. If you do > not have NAs, but rather some kind of threshold, i'd suggest going over the > data and filling any applicable values with NAs, then use the library of your > choice. I find that kind of weird though, if you haven't detected much you > haven't detected much. Its part of the data, why impute? > > On 11.08.2012, at 23:01, aynumazi wrote: > >> Hello, >> >> I'm trying to impute data below detection limit (with multiple detection >> limits) >> so i need just a method or a code for imputation and then extract the >> complete dataset to do the analyses. >> Is there any package which could do that simply as i'm a beginner in R >> >> Thank you >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Imputing-data-below-detection-limit-tp4640057.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data below detection limit
Tempting a use of let me google that for you.. Anyway, theres a package called Imputation. I myself used the zoo package. There are probably lots of others since its a real common problem. They usually fill in places in you data that are designated as NA. I do not completely understand what you mean with detection limit. If you do not have NAs, but rather some kind of threshold, i'd suggest going over the data and filling any applicable values with NAs, then use the library of your choice. I find that kind of weird though, if you haven't detected much you haven't detected much. Its part of the data, why impute? On 11.08.2012, at 23:01, aynumazi wrote: > Hello, > > I'm trying to impute data below detection limit (with multiple detection > limits) > so i need just a method or a code for imputation and then extract the > complete dataset to do the analyses. > Is there any package which could do that simply as i'm a beginner in R > > Thank you > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Imputing-data-below-detection-limit-tp4640057.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Imputing data below detection limit
Hello, I'm trying to impute data below detection limit (with multiple detection limits) so i need just a method or a code for imputation and then extract the complete dataset to do the analyses. Is there any package which could do that simply as i'm a beginner in R Thank you -- View this message in context: http://r.789695.n4.nabble.com/Imputing-data-below-detection-limit-tp4640057.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data
Hi, For imputation using randomForest package, check ?rfImpute Weidong On Fri, Dec 2, 2011 at 6:00 PM, Peter Langfelder wrote: > On Fri, Dec 2, 2011 at 2:16 PM, khlam wrote: >> So I have a very big matrix of about 900 by 400 and there are a couple of NA >> in the list. I have used the following functions to impute the missing data >> >> data(pc) >> pc.na<-pc >> pc.roughfix <- na.roughfix(pc.na) >> pc.narf <- randomForest(pc.na, na.action=na.roughfix) >> >> >> yet it does not replace the NA in the list. Presently I want to replace the >> NA with maybe the mean of the rows or columns or some type of correlation. >> >> Any help would be appreciated. > > There are several imputation functions available in the various > packages - for example, packages Hmisc and e1071 both contain a > function called impute, and the package impute contains the function > impute.knn for nearest neighbor imputation. > > HTH, > > Peter > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data
On Fri, Dec 2, 2011 at 2:16 PM, khlam wrote: > So I have a very big matrix of about 900 by 400 and there are a couple of NA > in the list. I have used the following functions to impute the missing data > > data(pc) > pc.na<-pc > pc.roughfix <- na.roughfix(pc.na) > pc.narf <- randomForest(pc.na, na.action=na.roughfix) > > > yet it does not replace the NA in the list. Presently I want to replace the > NA with maybe the mean of the rows or columns or some type of correlation. > > Any help would be appreciated. There are several imputation functions available in the various packages - for example, packages Hmisc and e1071 both contain a function called impute, and the package impute contains the function impute.knn for nearest neighbor imputation. HTH, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or some type of correlation. Any help would be appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Imputing-data-tp4150041p4150041.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.