Folks: A comment ... subject to vigorous refutation, since it's jmui (just my uninformed opinon).
It strikes me that this is a case where one may need own up to the limitations of the data and be transparent about the tentativeness of the statistical approaches. I say this because the statistical literature and popular perception often seem to be that statistical methodology can overcome these limitations and produce definitive answers in spite of them. And, of course, statistical researchers tend to be enamored with their clever methodology and gloss over the inevitable fact that their proofs begin with "assume that ... " (reminding me of the old saw that "assume" can make an ass out of u and me). Perhaps a useful approach is sensitivity analysis: try several quite different approaches, each consistent with one reasonable set of assumptions, and see how they compare. Not a new idea, of course, but perhaps one worth being reminded of in such situations. As always, thanks for **your** knowledgeable summary of exactly these matters, Frank. -- Bert Gunter Genetech -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Frank E Harrell Jr Sent: Saturday, April 25, 2009 3:38 PM To: David Winsemius Cc: Emmanuel Charpentier; r-h...@stat.math.ethz.ch Subject: Re: [R] Multiple Imputation in mice/norm David Winsemius wrote: > > On Apr 25, 2009, at 9:25 AM, Frank E Harrell Jr wrote: > >> Emmanuel Charpentier wrote: >>> Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit : >>>> I'm trying to use either mice or norm to perform multiple imputation >>>> to fill >>>> in some missing values in my data. The data has some missing values >>>> because >>>> of a chemical detection limit (so they are left censored). I'd like >>>> to use >>>> MI because I have several variables that are highly correlated. In >>>> SAS's >>>> proc MI, there is an option with which you can limit the imputed >>>> values that >>>> are returned to some range of specified values. Is there a way to >>>> limit the >>>> values in mice? >>> You may do that by writing your own imputation function and assign them >>> for the imputation of particular variable (see argument >>> "imputationMethod" and details in the man page for "mice"). >>>> If not, is there another MI tool in R that will >>>> allow me to >>>> specify a range of acceptable values for my imputed data? >>> In the function amelia (package "Amelia"), you might specify a "bounds" >>> argument, which allows for such a limitation. However, be aware that >>> this might destroy the basic assumption of Amelia, which is that your >>> data are multivariate normal. Maybe a change of variable is in order (e. >>> g. log(concentration) has usually much better statistical properties >>> than concentration). >>> Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument >>> (TRUE by default) which limits imputations to the range of observed >>> values. >>> But if your left-censored variables are your dependent variables (not >>> covariates), may I suggest to analyze these data as censored data, as >>> allowed by Terry Therneau's "coxph" function (package "survival") ? code >>> your "missing" data as such a variable (use : >>> coxph(Surv(min(x,<yourlimit>,na.rm=TRUE), >>> !is.na(x),type="left")~<Yourmodel>) to do this on-the-fly). >>> Another possible idea is to split your (supposedly x) variable in two : >>> observed (logical), and value (observed value if observed, <detection >>> limit> if not) and include these two data in your model. You probably >>> will run into numerical difficulties due to the (built-in total >>> separation...). >>> HTH, >>> Emmanuel Charpentier >>>> Thanks for the help, >>>> Todd >>>> >> >> All see >> >> @Article{zha09non, >> author = {Zhang, Donghui and Fan, Chunpeng and Zhang, >> Juan and Zhang, {Cun-Hui}}, >> title = {Nonparametric methods for measurements below >> detection limit}, >> journal = Stat in Med, >> year = 2009, >> volume = 28, >> pages = {700-715}, >> annote = {lower limit of detection;left censoring;Tobit >> model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location >> shift model;superiority of nonparametric methods} >> } >> >> >> -- >> Frank E Harrell Jr Professor and Chair School of Medicine >> Department of Biostatistics Vanderbilt University >> > > It appears they were dealing with outcomes possibly censored at a limit > of detection. At least that was the example they used to illustrate. > > Is there a message that can be inferred about what to do with covariates > with values below the limit of detection? And can someone translate to a > non-statistician what the operational process was on the values below > the limit of detection in the Wilcoxon approach that they endorsed? They > transformed the right censored situation into a left censored one and > then they do ... what? > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > Yes it's easier to handle in the dependent variable. For independent variables below the limit of detection we are left with model-based extrapolation for multiple imputation, with no way to check the imputation model's regression assumption. Predictive mean matching can't be used. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.