Hi Rod, Thanks for your input. Since I wrote initially, I've had a couple of ideas. The first one is similar to the duration variable approach you suggested. The idea would be to introduce a duration variable into the imputation dataset that would be calculated using cases with complete data for start and stop. Then the stop date could be constrained to equal start + duration. Or possibly the stop could just be directly calculated as start + duration.
The second idea involves creating a set of 4 variables: MIN_START, MAX_START, MIN_STOP, and MAX_STOP. These can generally be created using the limited date information that I have available. If, for example, I know that a person started taking a drug in 2004 but nothing else, I can calculate the minimum start as 01/01/04 and the maximum start as 12/31/04. Then I can tell IVEware to constrain the imputed value to be between these two dates. I've been playing with this approach a little earlier today, and, so for, it seems to be working quite well. So now I'm just hoping that the duration approach can also be successfully implemented. Thanks, Paul Paul J. Miller, Ph.D. Research Scientist and Statistician Ontario HIV Treatment Network 1300 Yonge St., Suite 308 Toronto, Ontario M4T 1X3 Phone: (416) 642-6486 ext 232 Fax: (416) 640-4245 -----Original Message----- From: Roderick A. Rose [mailto:[email protected]] Sent: Thursday, August 31, 2006 11:48 AM To: Paul Miller; [email protected] Subject: Re: [Impute] Imputing "Plausible" Start and Stop Dates for HIV Antiretroviral Drugs Paul, My recommended solution is made under the (perhaps incorrect) assumption that what you are mainly interested in is the interval between the start and stop dates and not the actual stop and start dates themselves. Let the start date equal zero in every case (so it doesn't have to be imputed) and the interval is a count of days (or another unit) between zero and the stop date. You impute this interval. I've not used IVEware, so I'm not sure this will completely eliminate the problem (e.g., you might end up with negative intervals if the bounds statement really doesn't work well). Regarding the second issue of plausibility, I am curious if it is necessary to have precision in days; if you know it happened in May 1998, you can err on the side of the least undesirable bias (by making it either May 31 or May 1). This is an alternative to ignoring the known value and letting it impute a completely new and possibly unrelated value. (Or do both and see what happens, as many of us probably do). Best, Rod
