Your request for a more general approach is precisely the reason that Hadley Wickham wrote the plyr package. He describes a split-apply- combine strategy for a variety of data structures and tools to implement those strategies here:

http://had.co.nz/plyr/plyr-intro-090510.pdf

The argument to the "by" stp is a column name rather than a list or object as it would be in tapply or split. I is just the identity function which doubles for return(x) in your code.

library(plyr)
> ddply(y, "month", fun=I)
      suid month esr
1  1074034     1   2
2  1074034     1   1
3  1074034     1   2
4  1074034     1   9
5  1123003     1   2
6  1074034     2   2
7  1074034     2   1
8  1074034     2   2
9  1074034     2   9
10 1123003     2   2
11 1074034     3   2
12 1074034     3   1
13 1074034     3   2
14 1074034     3   9
15 1123003     3   2
16 1074034    12   6
17 1074034    12   1
18 1074034    12   2
19 1074034    12   9
20 1123003    12   2

On Jun 24, 2009, at 11:34 PM, Stephan Lindner wrote:

Dear all,


I have a code where I subset a data frame to match entries within
levels of an factor (actually, the full script uses three difference
factors do do that). I'm very happy with the precision with which I can work with R, but since I loop over factor levels, and the data frame is
big, the process is slow. So I've been trying to speed up the process
using by(), but I got stuck at the point where I want to stack back
the sub- data frames, and I was wondering whether someone could help me
out.

Here is an example:

<--

y <- data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
                month = rep(c(12,1,2,3),5),
                esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))


by(y,y$month,function(x)return(x))

y$month: 1
     suid month esr
2  1074034     1   2
6  1074034     1   1
10 1074034     1   2
14 1074034     1   9
18 1123003     1   2
------------------------------------------------------------
y$month: 2
     suid month esr
3  1074034     2   2
7  1074034     2   1
11 1074034     2   2
15 1074034     2   9
19 1123003     2   2
------------------------------------------------------------
y$month: 3
     suid month esr
4  1074034     3   2
8  1074034     3   1
12 1074034     3   2
16 1074034     3   9
20 1123003     3   2
------------------------------------------------------------
y$month: 12
     suid month esr
1  1074034    12   6
5  1074034    12   1
9  1074034    12   2
13 1074034    12   9
17 1123003    12   2

-->

What I would like to do is stacking these four data frames back to one
data frame, which in this simple example would just be y. I tried
unlist(), unclass() and rbind(), but none of them would work.


Thanks a lot,



        Stephan










--
-----------------------
Stephan Lindner
University of Michigan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to