Re: [R] how to replace NA with a specific score that is dependant on another indicator variable

David Winsemius Wed, 01 Sep 2010 06:56:23 -0700


On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:

Hi everyone,
I’m looking for a clever bit of code to replace NA’s with a specificscore
depending on an indicator variable.
I can see how to do it using lots of if statements but I’m surethere most
be a neater, better way of doing it.
Any ideas at all will be much appreciated, I’m dreading coding upall those
if statements!!!!!

My problem is as follows:

I have a data set with lots of missing data:

EG Raw Data Set
Category variable1 variable2variable3
     1                            5                            NA
NA

     1                           NA
3                              4

     2                            NA
      7                            NA

This does not do its work by category (since I got tired of fixingmangled htmlized datasets) but it seems to me that a tapply "wrap"could do either of these operations within categories:



> egraw
  Category variable1 variable2 variable3
1        1         5        NA        NA
2        1        NA         3         4
3        2        NA         7        NA

> lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)

sapply(x, function(z) if (is.na(z)){mnx}else{z})

                            }
         )
$Category
[1] 1 1 2

$variable1
[1] 5 5 5

$variable2
[1] 5 3 7

$variable3
[1] 4 4 4

> sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)

sapply(x, function(z) if (is.na(z)){mnx}else{z})

                            }
              )
     Category variable1 variable2 variable3
[1,]        1         5         5         4
[2,]        1         5         3         4
[3,]        2         5         7         4

   etc
Now I want to replace the NA’s with the average for each category,so if
these averages were:

EG Averages
Category variable1 variable2variable3
     1                           4.5
3.2                           2.5

     2                           3.5
      7.4                           5.9
So I’d like my data set to look like the following once I’vereplaced the
NA’s with the appropriate category average:

EG Imputed Data Set
Category variable1 variable2variable3
     1                            5                            3.2
2.5

     1                           4.5
3                              4

     2                           3.5
    7                             5.9

   etc

Any ideas would be very much appreciated!!!!!

You might add reading the Posing Guide and setting up your reader topost in plain text to your TODO list.


thankyou

Chris Howden


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to replace NA with a specific score that is dependant on another indicator variable

Reply via email to