[R] Help with ddply to eliminate a for..loop

2010-08-26 Thread Bos, Roger
I created a small example to show something that I do a lot of.  scale
data by month and return a data.frame with the output.  id represents
repeated observations over time and I want to scale the slope
variable.  The out variable shows the output I want.  My for..loop
does the job but is probably very slow versus other methods.  ddply
seems ideal, but despite playing with the baseball examples quite a bit
I can't figure out how to get it to work with my sample dataset.  

TIA for any help, Roger

Here is the sample code:

dat - data.frame(id=rep(letters[1:5],3),
time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
dat

for (i in 1:3) {
mat - dat[dat$time==i, ]
outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
if (i==1) {
out - outi
} else {
out - rbind(out, outi)
}
}
out

Here is the sample output:

 dat - data.frame(id=rep(letters[1:5],3),
time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)

 dat
   id time slope
1   a1 1
2   b1 2
3   c1 3
4   d1 4
5   e1 5
6   a2 6
7   b2 7
8   c2 8
9   d2 9
10  e210
11  a311
12  b312
13  c313
14  d314
15  e315

 for (i in 1:3) {
+ mat - dat[dat$time==i, ]
+ outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
+ if (i==1) {
+ out   [TRUNCATED] 

 out
   mat.time mat.id  slope
1 1  a -1.2649111
2 1  b -0.6324555
3 1  c  0.000
4 1  d  0.6324555
5 1  e  1.2649111
6 2  a -1.2649111
7 2  b -0.6324555
8 2  c  0.000
9 2  d  0.6324555
102  e  1.2649111
113  a -1.2649111
123  b -0.6324555
133  c  0.000
143  d  0.6324555
153  e  1.2649111
 
***

This message is for the named person's use only. It may\...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with ddply to eliminate a for..loop

2010-08-26 Thread Marc Schwartz
On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote:

 I created a small example to show something that I do a lot of.  scale
 data by month and return a data.frame with the output.  id represents
 repeated observations over time and I want to scale the slope
 variable.  The out variable shows the output I want.  My for..loop
 does the job but is probably very slow versus other methods.  ddply
 seems ideal, but despite playing with the baseball examples quite a bit
 I can't figure out how to get it to work with my sample dataset.  
 
 TIA for any help, Roger
 
 Here is the sample code:
 
 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
 dat
 
 for (i in 1:3) {
mat - dat[dat$time==i, ]
outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
if (i==1) {
out - outi
} else {
out - rbind(out, outi)
}
 }
 out
 
 Here is the sample output:
 
 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
 
 dat
   id time slope
 1   a1 1
 2   b1 2
 3   c1 3
 4   d1 4
 5   e1 5
 6   a2 6
 7   b2 7
 8   c2 8
 9   d2 9
 10  e210
 11  a311
 12  b312
 13  c313
 14  d314
 15  e315
 
 for (i in 1:3) {
 + mat - dat[dat$time==i, ]
 + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
 + if (i==1) {
 + out   [TRUNCATED] 
 
 out
   mat.time mat.id  slope
 1 1  a -1.2649111
 2 1  b -0.6324555
 3 1  c  0.000
 4 1  d  0.6324555
 5 1  e  1.2649111
 6 2  a -1.2649111
 7 2  b -0.6324555
 8 2  c  0.000
 9 2  d  0.6324555
 102  e  1.2649111
 113  a -1.2649111
 123  b -0.6324555
 133  c  0.000
 143  d  0.6324555
 153  e  1.2649111
 
 ***


Roger, seems like you might want:

See ?ave

 cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
   id time slope  slope
1   a1 1 -1.2649111
2   b1 2 -0.6324555
3   c1 3  0.000
4   d1 4  0.6324555
5   e1 5  1.2649111
6   a2 6 -1.2649111
7   b2 7 -0.6324555
8   c2 8  0.000
9   d2 9  0.6324555
10  e210  1.2649111
11  a311 -1.2649111
12  b312 -0.6324555
13  c313  0.000
14  d314  0.6324555
15  e315  1.2649111


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with ddply to eliminate a for..loop

2010-08-26 Thread Marc Schwartz
On Aug 26, 2010, at 3:40 PM, Marc Schwartz wrote:

 On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote:
 
 I created a small example to show something that I do a lot of.  scale
 data by month and return a data.frame with the output.  id represents
 repeated observations over time and I want to scale the slope
 variable.  The out variable shows the output I want.  My for..loop
 does the job but is probably very slow versus other methods.  ddply
 seems ideal, but despite playing with the baseball examples quite a bit
 I can't figure out how to get it to work with my sample dataset.  
 
 TIA for any help, Roger
 
 Here is the sample code:
 
 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
 dat
 
 for (i in 1:3) {
   mat - dat[dat$time==i, ]
   outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
   if (i==1) {
   out - outi
   } else {
   out - rbind(out, outi)
   }
 }
 out
 
 Here is the sample output:
 
 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
 
 dat
  id time slope
 1   a1 1
 2   b1 2
 3   c1 3
 4   d1 4
 5   e1 5
 6   a2 6
 7   b2 7
 8   c2 8
 9   d2 9
 10  e210
 11  a311
 12  b312
 13  c313
 14  d314
 15  e315
 
 for (i in 1:3) {
 + mat - dat[dat$time==i, ]
 + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
 + if (i==1) {
 + out   [TRUNCATED] 
 
 out
  mat.time mat.id  slope
 1 1  a -1.2649111
 2 1  b -0.6324555
 3 1  c  0.000
 4 1  d  0.6324555
 5 1  e  1.2649111
 6 2  a -1.2649111
 7 2  b -0.6324555
 8 2  c  0.000
 9 2  d  0.6324555
 102  e  1.2649111
 113  a -1.2649111
 123  b -0.6324555
 133  c  0.000
 143  d  0.6324555
 153  e  1.2649111
 
 ***
 
 
 Roger, seems like you might want:
 
 See ?ave
 
 cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
   id time slope  slope
 1   a1 1 -1.2649111
 2   b1 2 -0.6324555
 3   c1 3  0.000
 4   d1 4  0.6324555
 5   e1 5  1.2649111
 6   a2 6 -1.2649111
 7   b2 7 -0.6324555
 8   c2 8  0.000
 9   d2 9  0.6324555
 10  e210  1.2649111
 11  a311 -1.2649111
 12  b312 -0.6324555
 13  c313  0.000
 14  d314  0.6324555
 15  e315  1.2649111


Quick fine tune, as I forgot to remove the original 'slope' column above.

 cbind(dat[, -3], slope = ave(dat$slope, list(dat$time), FUN = scale))
   id time  slope
1   a1 -1.2649111
2   b1 -0.6324555
3   c1  0.000
4   d1  0.6324555
5   e1  1.2649111
6   a2 -1.2649111
7   b2 -0.6324555
8   c2  0.000
9   d2  0.6324555
10  e2  1.2649111
11  a3 -1.2649111
12  b3 -0.6324555
13  c3  0.000
14  d3  0.6324555
15  e3  1.2649111


Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with ddply to eliminate a for..loop

2010-08-26 Thread Gabor Grothendieck
On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger roger@rothschild.com wrote:
 I created a small example to show something that I do a lot of.  scale
 data by month and return a data.frame with the output.  id represents
 repeated observations over time and I want to scale the slope
 variable.  The out variable shows the output I want.  My for..loop
 does the job but is probably very slow versus other methods.  ddply
 seems ideal, but despite playing with the baseball examples quite a bit
 I can't figure out how to get it to work with my sample dataset.

 TIA for any help, Roger

 Here is the sample code:

 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
 dat

 for (i in 1:3) {
    mat - dat[dat$time==i, ]
    outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
    if (i==1) {
        out - outi
    } else {
        out - rbind(out, outi)
    }
 }
 out

 Here is the sample output:

 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)

 dat
   id time slope
 1   a    1     1
 2   b    1     2
 3   c    1     3
 4   d    1     4
 5   e    1     5
 6   a    2     6
 7   b    2     7
 8   c    2     8
 9   d    2     9
 10  e    2    10
 11  a    3    11
 12  b    3    12
 13  c    3    13
 14  d    3    14
 15  e    3    15

 for (i in 1:3) {
 +     mat - dat[dat$time==i, ]
 +     outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
 +     if (i==1) {
 +         out   [TRUNCATED]

 out
   mat.time mat.id      slope
 1         1      a -1.2649111
 2         1      b -0.6324555
 3         1      c  0.000
 4         1      d  0.6324555
 5         1      e  1.2649111
 6         2      a -1.2649111
 7         2      b -0.6324555
 8         2      c  0.000
 9         2      d  0.6324555
 10        2      e  1.2649111
 11        3      a -1.2649111
 12        3      b -0.6324555
 13        3      c  0.000
 14        3      d  0.6324555
 15        3      e  1.2649111



Try ave:

transform(dat, slope = ave(slope, time, FUN = scale))


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with ddply to eliminate a for..loop

2010-08-26 Thread Ista Zahn
A ddply solution is

dat.out - ddply(dat, .(time), transform, slope = scale(slope))

but this is not faster than the loop, and slower than the ave() solution:

 system.time(
+ for (i in 1:3) {
+mat - dat[dat$time==i, ]
+outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
+if (i==1) {
+out - outi
+} else {
+out - rbind(out, outi)
+}
+ }
+ )
   user  system elapsed
  0.024   0.000   0.025

 system.time(
+ dat.out - ddply(dat, .(time), transform, slope = scale(slope))
+ )
   user  system elapsed
  0.032   0.000   0.031


 system.time(
+ cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
+ )
   user  system elapsed
  0.008   0.000   0.007


On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger roger@rothschild.comwrote:

 I created a small example to show something that I do a lot of.  scale
 data by month and return a data.frame with the output.  id represents
 repeated observations over time and I want to scale the slope
 variable.  The out variable shows the output I want.  My for..loop
 does the job but is probably very slow versus other methods.  ddply
 seems ideal, but despite playing with the baseball examples quite a bit
 I can't figure out how to get it to work with my sample dataset.

 TIA for any help, Roger

 Here is the sample code:

 dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
 dat

 for (i in 1:3) {
mat - dat[dat$time==i, ]
outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
if (i==1) {
out - outi
} else {
out - rbind(out, outi)
}
 }
 out

 Here is the sample output:

  dat - data.frame(id=rep(letters[1:5],3),
 time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)

  dat
   id time slope
 1   a1 1
 2   b1 2
 3   c1 3
 4   d1 4
 5   e1 5
 6   a2 6
 7   b2 7
 8   c2 8
 9   d2 9
 10  e210
 11  a311
 12  b312
 13  c313
 14  d314
 15  e315

  for (i in 1:3) {
 + mat - dat[dat$time==i, ]
 + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope))
 + if (i==1) {
 + out   [TRUNCATED]

  out
   mat.time mat.id  slope
 1 1  a -1.2649111
 2 1  b -0.6324555
 3 1  c  0.000
 4 1  d  0.6324555
 5 1  e  1.2649111
 6 2  a -1.2649111
 7 2  b -0.6324555
 8 2  c  0.000
 9 2  d  0.6324555
 102  e  1.2649111
 113  a -1.2649111
 123  b -0.6324555
 133  c  0.000
 143  d  0.6324555
 153  e  1.2649111
 
 ***

 This message is for the named person's use only. It ma...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.