Dimitris Rizopoulos wrote:
Hi Jan,

you could try the following:

dat <- data.frame(Price=c(10,12,NA,8,7,9,NA,9,NA),
                  Crop=c(rep("Rise", 5), rep("Wheat", 4)),
                  Season=c(rep("Summer", 3), rep("Winter", 4),
rep("Summer", 2)))
######
dat <- dat[order(dat$Season, dat$Crop),]
dat$Price.imp <- unlist(tapply(dat$Price, list(dat$Crop, dat$Season),
function(x){
  mx <- mean(x, na.rm=TRUE)
  ifelse(is.na(x), mx, x)
  }))

dat

However, you should be careful using this imputation technique since
you don't take into account the extra variability of imputing new
values in your data set. I don't know what analysis are you planning
to do but in any case I would recommend to read some standard
references for missing values, e.g., Little, R. and Rubin, D. (2002).
Statistical Analysis with Missing Data, New York: Wiley.

I hope this helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Doctoral Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- From: "Jan Smit" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 01, 2004 10:43 AM
Subject: [R] Imputing missing values




Dear all,

Apologies for this beginner's question. I have a
variable Price, which is associated with factors
Season and Crop, each of which have several levels.
The Price variable contains missing values (NA), which
I want to substitute by the mean of the remaining
(non-NA) Price values of the same Season-Crop
combination of levels.

Price     Crop    Season
10        Rice    Summer
12        Rice    Summer
NA        Rice    Summer
8         Rice    Winter
9         Wheat    Summer

Price[is.na(Price)] gives me the missing values, and
by(Price, list(Crop, Season), mean, na.rm = T) the
values I want to impute. What I've not been able to
figure out, by looking at by and the various
incarnations of apply, is how to do the actual
substitution.

Any help would be much appreciated.

Jan Smit

Or see the impute function in the Hmisc package and more general solutions also in Hmisc.



-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to