Hi Dylan, You might want to have a look at the plyr package which is designed to make these sorts of tasks easier - http://had.co.nz/plyr. The site includes a ~20 page introductory pdf.
Hadley On Wed, Oct 15, 2008 at 3:45 PM, dylan boyd <[EMAIL PROTECTED]> wrote: > Another request for help implementing the 'apply' functions to avoid a > loop structure... > > I am working with a data set that includes lab measurements taken at > different dates for the subjects, with some subjects having more > results than others. I would like to average lab results for each > subject that were taken on the same day. I can do this using a for > loop, but would like to know how to efficiently accomplish the same > thing without looping as I will likely have to do the same with a much > larger data set. > > At the end of this post are examples of what I'm starting with and > what I want the result to look like: > > I tried another suggestion I saw on this list using a list object for > the index of a call to 'tapply' as in: > >> new.x <- tapply(x, list(id, date), mean) > > but this produced a table-like object referencing every subject id > with every date in the dataset - too large for the full data set and > also would require serious re-working (at least with the tools I know) > to return to the original dataframe structure. > > Another attempt was pasting the id and date together to create a > single indexing vector. I could get this to work, but it seems clumsy > to be substring'ing the names attribute of the resulting dataframe and > implementing this with id's that range from 1 to 3 digits further > complicates things: > >> new.x <- tapply(x, paste(id, date),mean) >> data.frame( > + id = substr(names(new.x),start=1,stop=1), > + x = new.x, > + date = as.Date(substr(names(new.x),start=3,stop=100))) > id x date > 2 2005-12-15 2 21.0 2005-12-15 > 2 2006-01-13 2 22.5 2006-01-13 > 3 2000-04-05 3 17.0 2000-04-05 > 4 2003-05-23 4 18.0 2003-05-23 > 4 2003-07-08 4 27.0 2003-07-08 > 4 2003-11-30 4 24.5 2003-11-30 > 5 2001-04-19 5 23.0 2001-04-19 > > I could get this to work, but it seems clumsy to be substring'ing the > names attribute of the resulting dataframe and implementing. Also, > the full data set has subject id's that range from 1 to 3 digits > further complicates things the 'substr' call (although it just > occurred to me that I could use strsplit as well..). > > It may be irrelevant, but the 'date' variable is a Date class object. > I've tried first converting this to a character object but didn't get > anywhere. Further, I'll use the dates later with difftime to figure > the subjects' age at the onset of their condition, so I'd like to > avoid converting between classes too much. > > Any advice would be greatly appreciated. Here is the code to build > the sample data and the working for loop as well: > >> dum <- data.frame( > + id = c(2,2,2,3,4,4,4,4,5,5), > + x = sample(15:30,length(id)), > + date = > as.Date(c("12/15/2005","1/13/2006","1/13/2006","4/5/2000","5/23/2003", > + > "7/8/2003","11/30/2003","11/30/2003","4/19/2001","4/19/2001"),format="%m/%d/%Y") > + ) >> id.list <- unique(id) >> dum > id x date > 1 2 21 2005-12-15 > 2 2 22 2006-01-13 > 3 2 23 2006-01-13 > 4 3 17 2000-04-05 > 5 4 18 2003-05-23 > 6 4 27 2003-07-08 > 7 4 25 2003-11-30 > 8 4 24 2003-11-30 > 9 5 26 2001-04-19 > 10 5 20 2001-04-19 >> > > >> output <- NULL >> for (i in seq(along=id.list)) { > + sel <- dum$id==id.list[i] > + x.averaged <- tapply(dum$x[sel], dum$date[sel], mean, na.rm=TRUE) > + dat <- data.frame(id.list[i], x.averaged, names(x.averaged)) > + output <- rbind(output, dat) > + } >> names(output) <- names(dum) >> rownames(output) <- NULL >> output > id x date > 1 2 24.0 2005-12-15 > 2 2 22.0 2006-01-13 > 3 3 19.0 2000-04-05 > 4 4 22.0 2003-05-23 > 5 4 26.0 2003-07-08 > 6 4 28.5 2003-11-30 > 7 5 21.0 2001-04-19 >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.