Oops! My caveat about untested code was certainly appropriate. The normalization code below will not work.
Here is probably what I was thinking of doing: data <- within(data, norm <- value / tapply(value, group, sum)[group]) The same caveats apply here as below! ________________________________________ From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of bill.venab...@csiro.au [bill.venab...@csiro.au] Sent: 01 March 2010 17:18 To: n...@smartmediacorp.com; r-help@r-project.org Subject: [ExternalEmail] Re: [R] lapply with data frame Data frames are lists. Each column of the data frame is a component of the list. So in, e.g. lapply(data, function(x) x) the function would receive each column of the data frame in turn. To apply a function to each row of the data frame (which may need some care) one tool you can use is apply(...) apply(data, 1, function(x) ...) The form of the result will depend on the value of the function. If the value returned by the function is a vector, these will form the *columns* of the result of apply, not the rows, which will be a matrix. For the normalization problem, here is one way to do it: data <- within(data, norm <- tapply(value, group, function(x) x/sum(x))[group]) Warning 1: the second of these assignment operators may not be replaced by '='. Warning 2: untested code! ________________________________________ From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Noah Silverman [n...@smartmediacorp.com] Sent: 28 February 2010 12:37 To: r-help@r-project.org Subject: [R] lapply with data frame I'm a bit confused on how to use lapply with a data.frame. For example. lapply(data, function(x) print(x)) WHAT exactly is passed to the function. Is it each ROW in the data frame, one by one, or each column, or the entire frame in one shot? What I want to do apply a function to each row in the data frame. Is lapply the right way. A second application is to normalize a column value by group. For example, if I have the following table: id group value norm 1 A 3.2 2 A 3.0 3 A 3.1 4 B 5.5 5 B 6.0 6 B 6.2 etc... The long version would be: foreach (group in unique(data$group)){ data$norm[group==group] <- data$value[group==group] / sum(data$value[group==group]) } There must be a faster way to do this with lapply. (Ideally, I'd then use mclapply to run on multi-cores and really crank up the speed.) Any suggestions? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.