Hi Jan,
you could try the following:
dat <- data.frame(Price=c(10,12,NA,8,7,9,NA,9,NA), Crop=c(rep("Rise", 5), rep("Wheat", 4)), Season=c(rep("Summer", 3), rep("Winter", 4), rep("Summer", 2))) ###### dat <- dat[order(dat$Season, dat$Crop),] dat$Price.imp <- unlist(tapply(dat$Price, list(dat$Crop, dat$Season), function(x){ mx <- mean(x, na.rm=TRUE) ifelse(is.na(x), mx, x) }))
dat
However, you should be careful using this imputation technique since you don't take into account the extra variability of imputing new values in your data set. I don't know what analysis are you planning to do but in any case I would recommend to read some standard references for missing values, e.g., Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data, New York: Wiley.
I hope this helps.
Best, Dimitris
---- Dimitris Rizopoulos Doctoral Student Biostatistical Centre School of Public Health Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/396887 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
----- Original Message ----- From: "Jan Smit" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 01, 2004 10:43 AM
Subject: [R] Imputing missing values
Dear all,
Apologies for this beginner's question. I have a variable Price, which is associated with factors Season and Crop, each of which have several levels. The Price variable contains missing values (NA), which I want to substitute by the mean of the remaining (non-NA) Price values of the same Season-Crop combination of levels.
Price Crop Season 10 Rice Summer 12 Rice Summer NA Rice Summer 8 Rice Winter 9 Wheat Summer
Price[is.na(Price)] gives me the missing values, and by(Price, list(Crop, Season), mean, na.rm = T) the values I want to impute. What I've not been able to figure out, by looking at by and the various incarnations of apply, is how to do the actual substitution.
Any help would be much appreciated.
Jan Smit
Or see the impute function in the Hmisc package and more general solutions also in Hmisc.
-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html