On Sun, Aug 30, 2009 at 5:08 PM, Noah Silverman<n...@smartmediacorp.com> wrote: > Hi, > > I need a bit of guidance with the sapply function. I've read the help page, > but am still a bit unsure how to use it. > > I have a large data frame with about 100 columns and 30,000 rows. One of > the columns is "group" of which there are about 2,000 distinct "groups". > > I want to normalize (sum to 1) one of my variables per-group. > > Normally, I would just write a huge "for each" loop, but have read that is > hugely inefficient with R. > > The old way would be (just an example, syntax might not be perfect): > > for (group in data$group){ > for (score in data[data$group == group]){ > new_score <- score / sum(data$score[data$group==group]) > } > }
It might be easier to use ddply from the plyr package. The command you want would be: data <- ddply(data, "group", transform, score = score / sum(score)) More information at http://had.co.nz/plyr. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.