Seppo,

It seems to me that if adding a new category for missings is wrong for 
most purposes, having more reasons for missing will not help.  And if we 
use multiple imputation, we don't know how to make use of these 
different reasons for missing, despite Rubin's admonition to do so.

Frank


Laaksonen Seppo wrote:
> Just reading your comments after being on holiday.
> 
> I think as Paul that Maria's approach is OK but not best in several cases. I
> have even called this solution as a starting imputation method that should be
> always done. If possible, several missingess codes are useful to create since
> there are often some information behind the missingness (like refusal,
> non-contact, does not know, does not want to answer, does not know well but
> approximately, missing for unknown reasons). After this kind of coding, it is
> of course good to try for getting better 'imputed values.'
> In the case of an exploratory analysis with a number of explanatory
> (independent) variables, I have not tried to use such an approach for a
> dependent variable (Has someone done?), but this approach is not bad if this
> happens for explanatory variables. If the mssingness has even been coded
> ('imputed') well, you can interpret the results quite well (e.g. when I have
> been explaining happiness by a number of variables, income being partially
> missing with several codes, fortunately, it has not been difficult to see for
> example, that some missingess income people are unhappier than the average;
> note that this is not necessarily my main explanatory variable but an 
> important
> control variable; imputation for these missing categories could be dramatic
> since it would be a hard job but not still a main issue, and may lead to a
> bias). And as mentioned in my parenthesis example, this kind of a variable can
> be a control variable too. Note also that this is very often more advantageous
> than the two major alternatives: (i) data deletion when loosing a lot of
> observations (too often used in econometrics, for example) and (ii)
> unsuccessful imputation (it is also hard to know how well your imputation has
> been succeeded and your results can be problematic).
> 
> Keep always in mind what are your main estimates. Be realistic and not always
> use MI.
> 
> Seppo
> University of Helsinki
> 
>>> Nevertheless, I still believe that this method may be useful in two
>>> situations:
>>>
>>> 1. Data are "missing" because a variable doesn't apply or is undefined
>>> for some fraction of cases.  For example, suppose you have a measure
>>> of marital happiness, dichotomized as high or low, but your sample
>>> contains some unmarried people. Then it is entirely appropriate to
>>> have a 3-category variable with values high, low, and unmarried.
>> Nice example Paul.  I've added that to my notes and book, with
>> attribution.
>>
>>> 2. The goal is to build a forecasting model, and it is anticipated
>>> that a substantial fraction of the new cases to be forecast will have
>>> missing data on one or more variables. Here, the goal is not to get
>>> unbiased estimates of population parameters but to minimize some
>>> function of prediction errors. A workable forecasting model must have
>>> some way of dealing with the cases that have missing data. Maybe there
>>> are better ways, but I've found almost no literature on this topic
>>> (with the exception of Warren Sarle's unpublished paper).
>> My colleagues Janssen, Donders, and Moons in The Netherlands are working
>> on that.  Averaging predictions over multiple imputations is one
>> approach but there are others.  There are some logistical problems to
>> imputing especially with regard to updating the imputation rules.
>>
>> Cheers,
>> Frank
>>
>>> -----------------------------------------------------------------
>>> Paul D. Allison, Professor
>>> Department of Sociology
>>> University of Pennsylvania
>>> 581 McNeil Building
>>> 3718 Locust Walk
>>> Philadelphia, PA  19104-6299
>>> 215-898-6717
>>> 215-573-2081 (fax)
>>> http://www.ssc.upenn.edu/~allison
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Maria da
>>> Conceicao-Saraiva
>>> Sent: Saturday, July 04, 2009 9:19 AM
>>> To: [email protected]
>>> Subject: [Impute] weird question
>>>
>>>
>>>
>>>
>>> Sorry about this question,
>>>
>>> I have been discussing with some people I am working about the need of
>>> imputation with some of our data. What some of analysist are doing is
>>> just to creating a category of missing values inside some variables,
>>> they argue this is enough. It has been hard to argue with them that
>>> this is not the best way to do. Specially in our variable income, we
>>> have about 30% of missings.
>>> Does anybody know about  refereces discussing this approach of just
>>> creating a category for missing values inside a variable?
>>>
>>> Maria
>>>
>>>
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> ~~~~~~
>>> ~~
>>> Maria da Conceicao P. Saraiva DDS, MSc, Ph.D Departamento de Clinica
>>> Infantil e Odontologia Social e Preventiva Faculdade de Odontologia de
>>> Ribeirao Preto-Universidade de Sao Paulo
>>>
>>>   Aviso: Esta mensagem destina-se exclusivamente ao destinatario,
>> sendo
>>>   confidencial. Se V. Sa. nao eh o destinatario, fique advertido de
>>> que a divulgacao, distribuicao ou copia desta mensagem eh estritamente
>> proibida.
>>> Caso tenha recebido esta mensagem por engano, por favor avise
>>> imediatamente seu remetente atraves de resposta por e-mail. Obrigado.
>>>
>> --
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                      Department of Biostatistics   Vanderbilt
>> University
>>
>> _______________________________________________
>> Impute mailing list
>> [email protected]
>> http://lists.utsouthwestern.edu/mailman/listinfo/impute
>>
> 
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Reply via email to