Excerpts from Zahra via R-help's message of 2015-11-02 17:49:01 -0200: > Hi there, > > I am looking for some help replacing missing values in R with the row mean. > This is survey data and I am trying to impute values for missing variables in > each set of questions separately using the mean of the scores for the other > questions within that set. > > I have a dataset that looks like this > > ID A1 A2 A3 B1 B2 B3 C1 C2 C3 C4 > b 4 5 NA 2 NA 4 5 1 > 3 NA > c 4 5 1 NA 3 4 5 1 > 3 2 > d NA 5 1 1 NA 4 5 1 > 3 2 > e 4 5 4 5 NA 4 5 1 > 3 2 > > > I want to replace any NA's in columns A1:A3 with the row mean for those > columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is > the average of the values in A1 and A2 for that row. > Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of > the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is > (2+4)/2. And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is the > mean of C1:C3. > > Then I want to go to row ID=c and do the same thing and so on. > > Can anybody help me do this? I have tried using rowMeans and subsetting but > can't figure out the right code to do it. > > Thanks so much. > Zahra > use
is.na(df[ which(df$ID) == 'b']) <- fmean(...), where fmean: Depends on column selection (Axx, Byy, etc..) and the row id itself (so consider pass the left hand of assignment entirely). I would use: fmean <- function(row, col_selection) { # homework for you here } Best Regards, -- Marco Arthur @ (M)arco Creatives ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.