Just want to add that if you want to clean out the NA rows in a matrix or data frame, take a look at ?complete.cases. Can be handy to use with big datasets. I got curious, so I just ran the codes given here on a big dataset, before and after removing NA rows. I have to be honest, this is surely an illustration of the power of rowMeans. I'm amazed myself.
DF <- data.frame( A=rep(DF$A,10000), B=rep(DF$B,10000) ) > system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 13.26 0.06 13.46 > system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.03 0.00 0.03 > system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape > DF <- DF[complete.cases(DF),] > system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 0.39 0.00 0.39 > system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.01 0.00 0.02 > system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) user system elapsed 10.01 0.07 13.40 Cheers Joris On Sat, Jun 26, 2010 at 1:08 AM, emorway <emor...@engr.colostate.edu> wrote: > > Forum, > > Using the following data: > > DF<-read.table(textConnection("A B > 22.60 NA > NA NA > NA NA > NA NA > NA NA > NA NA > NA NA > NA NA > 102.00 NA > 19.20 NA > 19.20 NA > NA NA > NA NA > NA NA > 11.80 NA > 7.62 NA > NA NA > NA NA > NA NA > NA NA > NA NA > 75.00 NA > NA NA > 18.30 18.2 > NA NA > NA NA > 8.44 NA > 18.00 NA > NA NA > 12.90 NA"),header=T) > closeAllConnections() > > The second column is a duplicate reading of the first column, and when two > values are available, I would like to average column 1 and 2 (example code > below). But if there is only one reading, I would like to retain it, but I > haven't found a good way to exclude NA's using the following code: > > t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) > > Currently, row 24 is the only row with a returned value. I'd like the > result to return column "A" if it is the only available value, and average > where possible. Of course, if both columns are NA, NA is the only possible > result. > > The result I'm after would look like this (row 24 is an avg): > > 22.60 > NA > NA > NA > NA > NA > NA > NA > 102.00 > 19.20 > 19.20 > NA > NA > NA > 11.80 > 7.62 > NA > NA > NA > NA > NA > 75.00 > NA > 18.25 > NA > NA > 8.44 > 18.00 > NA > 12.90 > > This is a small example from a much larger data frame, so if you're > wondering what the deal is with list(), that will come into play for the > larger problem I'm trying to solve. > > Respectfully, > Eric > -- > View this message in context: > http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.