Maria da Conceicao-Saraiva wrote:
> 
> 
> 
> Sorry about this question,
> 
> I have been discussing with some people I am working about the need of 
> imputation with some of our data. What some of analysist are doing is 
> just to creating a category of missing values inside some variables, 
> they argue this is enough. It has been hard to argue with them that this 
> is not the best way to do. Specially in our variable income, we have 
> about 30% of missings.
> Does anybody know about  refereces discussing this approach of just 
> creating a category for missing values inside a variable?
> 
> Maria
> 

Maria,

That approach is a disaster, failing even if missings are completely at 
random.  There are several papers on the subject, referenced in 
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf

See http://biostat.mc.vanderbilt.edu/rms for more information.

One easy way to see that this approach is a disaster is to realize that 
a new category changes the definition of the variable, and a test of 
association between the new variable and the dependent variable is a 
joint test of (effect of original variable, missingness of the variable) 
being associated with Y.

It is amazing how many analysts create a method of analyzing data 
without ever even being tempted to study the performance of the method.

Frank
-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Reply via email to