On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:

Hi everyone,



I’m looking for a clever bit of code to replace NA’s with a specific score
depending on an indicator variable.

I can see how to do it using lots of if statements but I’m sure there most
be a neater, better way of doing it.

Any ideas at all will be much appreciated, I’m dreading coding up all those
if statements!!!!!

My problem is as follows:

I have a data set with lots of missing data:

EG Raw Data Set

Category variable1 variable2 variable3

     1                            5                            NA
NA

     1                           NA
3                              4

     2                            NA
      7                            NA

This does not do its work by category (since I got tired of fixing mangled htmlized datasets) but it seems to me that a tapply "wrap" could do either of these operations within categories:


> egraw
  Category variable1 variable2 variable3
1        1         5        NA        NA
2        1        NA         3         4
3        2        NA         7        NA

> lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
sapply(x, function(z) if (is.na(z)) {mnx}else{z})
                            }
         )
$Category
[1] 1 1 2

$variable1
[1] 5 5 5

$variable2
[1] 5 3 7

$variable3
[1] 4 4 4

> sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
sapply(x, function(z) if (is.na(z)) {mnx}else{z})
                            }
              )
     Category variable1 variable2 variable3
[1,]        1         5         5         4
[2,]        1         5         3         4
[3,]        2         5         7         4


   etc

Now I want to replace the NA’s with the average for each category, so if
these averages were:

EG Averages

Category variable1 variable2 variable3

     1                           4.5
3.2                           2.5

     2                           3.5
      7.4                           5.9



So I’d like my data set to look like the following once I’ve replaced the
NA’s with the appropriate category average:

EG Imputed Data Set

Category variable1 variable2 variable3

     1                            5                            3.2
2.5

     1                           4.5
3                              4

     2                           3.5
    7                             5.9

   etc

Any ideas would be very much appreciated!!!!!

You might add reading the Posing Guide and setting up your reader to post in plain text to your TODO list.

thankyou

Chris Howden

.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to