Oops!  My caveat about untested code was certainly appropriate.  The 
normalization code below will not work.  

Here is probably what I was thinking of doing:

data <- within(data, norm <- value / tapply(value, group, sum)[group])

The same caveats apply here as below!


________________________________________
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
bill.venab...@csiro.au [bill.venab...@csiro.au]
Sent: 01 March 2010 17:18
To: n...@smartmediacorp.com; r-help@r-project.org
Subject: [ExternalEmail] Re: [R] lapply with data frame

Data frames are lists.  Each column of the data frame is a component of the 
list.  So in, e.g.

lapply(data, function(x) x)

the function would receive each column of the data frame in turn.

To apply a function to each row of the data frame (which may need some care) 
one tool you can use is apply(...)

apply(data, 1, function(x) ...)

The form of the result will depend on the value of the function.  If the value 
returned by the function is a vector, these will form the *columns* of the 
result of apply, not the rows, which will be a matrix.

For the normalization problem, here is one way to do it:

data <- within(data, norm <- tapply(value, group, function(x) x/sum(x))[group])


Warning 1: the second of these assignment operators may not be replaced by '='.
Warning 2: untested code!

________________________________________
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Noah Silverman [n...@smartmediacorp.com]
Sent: 28 February 2010 12:37
To: r-help@r-project.org
Subject: [R] lapply with data frame

I'm a bit confused on how to use lapply with a data.frame.

For example.

lapply(data, function(x) print(x))

WHAT exactly is passed to the function.  Is it each ROW in the data
frame, one by one, or each column, or the entire frame in one shot?

What I want to do apply a function to each row in the data frame.  Is
lapply the right way.

A second application is to normalize a column value by group.  For
example, if I have the following table:
id    group    value      norm
1    A            3.2
2    A            3.0
3    A            3.1
4    B            5.5
5    B            6.0
6    B            6.2
etc...

The long version would be:
foreach (group in unique(data$group)){
     data$norm[group==group] <- data$value[group==group] /
sum(data$value[group==group])
}

There must be a faster way to do this with lapply.  (Ideally, I'd then
use mclapply to run on multi-cores and really crank up the speed.)

Any suggestions?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to