Craig,

thanks for your suggestions. I will try to code the largest category with 
the highest number. The method with my own dummies worked well all in all, 
except for the last category. To avoid multiple positive values, I imputed 
one category after the other (starting with the largest). If a observation 
had a 1 for one dummy, I made no further imputations for the other dummies 
for this observation (RESTRICT-Statement in IVEware; further imputations 
only if all previous dummies are zero).

Hans-Peter




Craig Newgard <[email protected]> 
22.09.2009 05:21

An
"[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Kopie

Thema
RE: [Impute] IVEware: Imputation of categorical variables with many 
categories





Hans-Peter,
Not sure if you've found a response to your question below yet, but I have 
been through similar scenarios with IVEware before.  My suggestion would 
be to keep the primary (polytomous) categorical terms in the MI code, as 
this allows IVEware to create dummies, while still recognizing that they 
are mutually exclusive categories.  If you create your own dummies, you 
run the risk of imputing positive values for multiple dummies on the same 
observation.  A few other things you could try to improve efficiency 
include: making sure the largest category (reference) is coded with the 
highest number in the term (default reference in IVEware) and examine 
small categories within the term and consider collapsing categories if 
appropriate.  I've found that many IVEware MI models routinely require 24+ 
hours to run with such terms included.  If these suggestions still fail to 
increase the MI efficiency, you could also consider running parallel 
chains of MI, using separate MI models for subjects within each category 
of the original term (this is commonly used for interaction terms and 
works best if each category has an adequate number of observations and 
minimal missing data).

Craig
________________________________________
From: [email protected] 
[[email protected]] On Behalf Of 
[email protected] [[email protected]]
Sent: Monday, September 21, 2009 1:45 AM
To: [email protected]
Subject: [Impute] IVEware: Imputation of categorical variables with many   
  categories

Hello,

I have a dataset with about 50.000 records and 30 variables. Among the 
variables are 2 categorical with many categories: Federal state (16 
categories) and branch of economic activity (80 - 100 categories). Since I 
want to produce a synthetic dataset, I double the dataset by replacing all 
values of one variable with missings.

Now to my problem with IVEware: If I want to impute for example the 
federal state, after 5-6 hours still the first iteration is running, so it 
takes too long.
My second attempt: I compute dummies for the 16 federal states. At first I 
impute the state having the most units, then the one with the second most 
units and so on. All in all this works well, but for the last state there 
are only 20-30 units remaining (original data: 358 units). I tried to swap 
the order of the smallest and the second smallest state: This didn't solve 
the problem. Now the second smallest state has by far too few units in the 
synthetic dataset. Does anyone have any further suggestions how one can 
handle categorical variables with many values in IVEware?


Kind regards
Hans-Peter Hafner

STATISTIK HESSEN

-----------
Hessisches Statistisches Landesamt
Rheinstra?e 35/37
65175 Wiesbaden
Internet: http://www.statistik-hessen.de

Telefon: 0611 3802-815
Telefax: 0611 3802-890
E-Mail: [email protected]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20090922/9216ad9a/attachment.htm
From allison <@t> soc.upenn.edu  Tue Sep 29 13:05:42 2009
From: allison <@t> soc.upenn.edu (Paul Allison)
Date: Fri Oct  9 16:51:05 2009
Subject: [Impute] 2-day missing data seminar in November
In-Reply-To: <[email protected]>
Message-ID: <[email protected]>

On November 13-14 in Atlanta, I will present my two-day seminar on Missing
Data. Early-bird discounted registration is available until October 1.
 
This course provides an in-depth look at modern methods for handling missing
data, with particular emphasis on maximum likelihood and multiple
imputation. Although the course is applications oriented, it also covers the
conceptual underpinnings of these new methods in considerable detail.
Maximum likelihood is illustrated with two software programs, Mplus and LEM.
Multiple imputation is demonstrated with two SAS procedures (MI and
MIANALYZE) and two Stata commands (mi and ice).
 
The course will be held at the Hampton Inn and Suites, 161 Spring St. NW,
Atlanta, GA. A block of sleeping rooms has been reserved at the hotel at a
reduced rate. 
 
You can get more information about this course at
 
   www.PaulDAllison.com 

-----------------------------------------------------------------
Paul D. Allison, Professor
Department of Sociology
University of Pennsylvania
581 McNeil Building
3718 Locust Walk
Philadelphia, PA  19104-6299
215-898-6717
215-573-2081 (fax)
http://www.pauldallison.com
 

Reply via email to