Scott

Murray Aitkin and I have implemented a nonparametric ML for regression
models that uses the empirical joint distn of the covariates in a more
general version of Ibrahims (1990) paper in JASA.  It doesn't assume a
(parameteric) model for the distn of the covariates.  Currently I have
R code for the normal linear regression model for any number of
covariates (using EM algorithm). Code for distn from the exponential
family is being brewed currently so unfortunately not available
(except for a single covariate case logistic-binomial and log-poisson
cases).

Regards
Ross Darnell

Frank E Harrell Jr <[email protected]> writes:

> Scott - To add a bit to Rod's excellent advice, the S-Plus transcan
> function can use Fisher's optimum scoring algorithm to score
> categorical variables.  The approximate Bayesian bootstrap is used
> to sample residuals off of these scores then to back-compute to
> estimate categories "closest" to the predicted randomly drawn
> scores.  See
> http://hesweb1.med.virginia.edu/biostat/s/help/transcan.html
> 
> transcan can also use recursive partitioning to impute categorical
> predictors but at present this only works for single "best guess"
> imputation.  I need to implement Rod's advice to add random sampling
> from the estimated multinomial distribution.
> 
> -Frank Harrell
> 
> 
> Rod Little wrote:
> > 
> > Scot: you might take a look at my review paper on missing data in
> > regression in JASA (ref below). An improvement of your method would be to
> > compute conditional probabilities of being in the categories given the
> > other covariates and Y, and draw the category using these conditional
> > probabilities. This could be done multiply to incorporate imputation error
> > (see the section on multiple imputation in the above reference).
> > 
> > This method assumes the missing data are missing at random, which may
> > not be appropriate in your setting. The simple approach of deleting the
> > incomplete cases (complete-case analysis) is valid if missingness depends
> > on the value of the covariates but not the outcome, which may be plausible
> > here; one might consider both multiple imputation (as outlined above) and
> > complete-case analysis and see if the substantive findings differ, as a
> > form of sensitivity analysis. Best, Rod Little.
> > 
> > Reference: Little, R.J.A. (1992). Regression with missing X's: a
> > review. Journal of the American Statistical Association, 87, 1227-1237.
> > 
> > On Mon, 26 Feb 2001, Scot W McNary wrote:
> > 
> > >
> > > Hi,
> > >
> > > I'm working with three ANCOVAs with categorical covariates.  The variables
> > > of interest are continuous as are the DVs and all of these variables are
> > > completely observed.  The missing data exist for the categorical
> > > predictors.  There are three of them:
> > >
> > > 1) four level predictor, 17% missing data
> > > 2) four level predictor, 7% missing data
> > > 3) two level predictor, 3% missing data
> > >
> > > The investigators I'm working for have good reason to believe that these
> > > data are unavailable vs. not applicable.  They are items which ask about
> > > different mutually exclusive/exhaustive aspects of abuse experienced by
> > > individuals.  It's reasonable to expect that there is some (unobserved)
> > > response to these items since individuals were selected into the study
> > > based on their exposure to abuse.  It's likely that some individuals
> > > refused to answer these items.  Unfortunately, the original data coders
> > > are not available to ask about the proportion of refused vs. don't know
> > > responses in each of these cases.
> > >
> > > My simple minded approach was to collapse these individuals into one of
> > > the existing categories.  To do this I found the outcome means for each
> > > level of the predictors and collapsed the missing value cases into the
> > > category with the most similar outcome means.
> > >
> > > I understand these missing data to be non-ignorable.  But since the
> > > function of imputation for this analysis is to maximize the N for the
> > > covariates and not the primary focus of the study, I initially thought
> > > that a simple-minded, ad hoc approach would suffice.  However, a
> > > reviewer's question has caused me to rethink that.
> > >
> > > The reviewer believes that we have used an illegitimate method that has
> > > overly favored our hypotheses by "imputing" data in this way.  I disagree
> > > that we have biased our data in favor of our hypotheses, first, since we
> > > had no hypotheses about the covariates per se, and second only one of the
> > > 21 contrasts implied by the levels of the covariates was significant.
> > >
> > > The reviewer's general point that my ad hoc method is not standard has
> > > caused me to consider asking for advice from others with more experience.
> > > Should I be engaging in a more formal imputation procedure (e.g., multiple
> > > imputation), for these covariates?  Are there problems with my approach I
> > > haven't forseen?  Any suggestions welcomed.
> > >
> > > Thanks in advance,
> > >
> > > Scot McNary
> > >
> > >
> > > --
> > >   Scot W. McNary  email:[email protected]
> > >
> > >
> > >
> > 
> > ___________________________________________________________________________________
> > Roderick Little
> > Chair, Department of Biostatistics                    (734) 936-1003
> > U-M School of Public Health                     Fax:  (734) 763-2215
> > M4208 SPH II                                       [email protected]
> > 1420 Washington Hgts               http://www.sph.umich.edu/~rlittle/
> > Ann Arbor, MI 48109-2029
> 
> -- 
> Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Reply via email to