[R] Help with ddply to eliminate a for..loop
I created a small example to show something that I do a lot of. scale data by month and return a data.frame with the output. id represents repeated observations over time and I want to scale the slope variable. The out variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply seems ideal, but despite playing with the baseball examples quite a bit I can't figure out how to get it to work with my sample dataset. TIA for any help, Roger Here is the sample code: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat for (i in 1:3) { mat - dat[dat$time==i, ] outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) if (i==1) { out - outi } else { out - rbind(out, outi) } } out Here is the sample output: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat id time slope 1 a1 1 2 b1 2 3 c1 3 4 d1 4 5 e1 5 6 a2 6 7 b2 7 8 c2 8 9 d2 9 10 e210 11 a311 12 b312 13 c313 14 d314 15 e315 for (i in 1:3) { + mat - dat[dat$time==i, ] + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out [TRUNCATED] out mat.time mat.id slope 1 1 a -1.2649111 2 1 b -0.6324555 3 1 c 0.000 4 1 d 0.6324555 5 1 e 1.2649111 6 2 a -1.2649111 7 2 b -0.6324555 8 2 c 0.000 9 2 d 0.6324555 102 e 1.2649111 113 a -1.2649111 123 b -0.6324555 133 c 0.000 143 d 0.6324555 153 e 1.2649111 *** This message is for the named person's use only. It may\...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with ddply to eliminate a for..loop
On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote: I created a small example to show something that I do a lot of. scale data by month and return a data.frame with the output. id represents repeated observations over time and I want to scale the slope variable. The out variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply seems ideal, but despite playing with the baseball examples quite a bit I can't figure out how to get it to work with my sample dataset. TIA for any help, Roger Here is the sample code: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat for (i in 1:3) { mat - dat[dat$time==i, ] outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) if (i==1) { out - outi } else { out - rbind(out, outi) } } out Here is the sample output: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat id time slope 1 a1 1 2 b1 2 3 c1 3 4 d1 4 5 e1 5 6 a2 6 7 b2 7 8 c2 8 9 d2 9 10 e210 11 a311 12 b312 13 c313 14 d314 15 e315 for (i in 1:3) { + mat - dat[dat$time==i, ] + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out [TRUNCATED] out mat.time mat.id slope 1 1 a -1.2649111 2 1 b -0.6324555 3 1 c 0.000 4 1 d 0.6324555 5 1 e 1.2649111 6 2 a -1.2649111 7 2 b -0.6324555 8 2 c 0.000 9 2 d 0.6324555 102 e 1.2649111 113 a -1.2649111 123 b -0.6324555 133 c 0.000 143 d 0.6324555 153 e 1.2649111 *** Roger, seems like you might want: See ?ave cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale)) id time slope slope 1 a1 1 -1.2649111 2 b1 2 -0.6324555 3 c1 3 0.000 4 d1 4 0.6324555 5 e1 5 1.2649111 6 a2 6 -1.2649111 7 b2 7 -0.6324555 8 c2 8 0.000 9 d2 9 0.6324555 10 e210 1.2649111 11 a311 -1.2649111 12 b312 -0.6324555 13 c313 0.000 14 d314 0.6324555 15 e315 1.2649111 HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with ddply to eliminate a for..loop
On Aug 26, 2010, at 3:40 PM, Marc Schwartz wrote: On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote: I created a small example to show something that I do a lot of. scale data by month and return a data.frame with the output. id represents repeated observations over time and I want to scale the slope variable. The out variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply seems ideal, but despite playing with the baseball examples quite a bit I can't figure out how to get it to work with my sample dataset. TIA for any help, Roger Here is the sample code: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat for (i in 1:3) { mat - dat[dat$time==i, ] outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) if (i==1) { out - outi } else { out - rbind(out, outi) } } out Here is the sample output: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat id time slope 1 a1 1 2 b1 2 3 c1 3 4 d1 4 5 e1 5 6 a2 6 7 b2 7 8 c2 8 9 d2 9 10 e210 11 a311 12 b312 13 c313 14 d314 15 e315 for (i in 1:3) { + mat - dat[dat$time==i, ] + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out [TRUNCATED] out mat.time mat.id slope 1 1 a -1.2649111 2 1 b -0.6324555 3 1 c 0.000 4 1 d 0.6324555 5 1 e 1.2649111 6 2 a -1.2649111 7 2 b -0.6324555 8 2 c 0.000 9 2 d 0.6324555 102 e 1.2649111 113 a -1.2649111 123 b -0.6324555 133 c 0.000 143 d 0.6324555 153 e 1.2649111 *** Roger, seems like you might want: See ?ave cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale)) id time slope slope 1 a1 1 -1.2649111 2 b1 2 -0.6324555 3 c1 3 0.000 4 d1 4 0.6324555 5 e1 5 1.2649111 6 a2 6 -1.2649111 7 b2 7 -0.6324555 8 c2 8 0.000 9 d2 9 0.6324555 10 e210 1.2649111 11 a311 -1.2649111 12 b312 -0.6324555 13 c313 0.000 14 d314 0.6324555 15 e315 1.2649111 Quick fine tune, as I forgot to remove the original 'slope' column above. cbind(dat[, -3], slope = ave(dat$slope, list(dat$time), FUN = scale)) id time slope 1 a1 -1.2649111 2 b1 -0.6324555 3 c1 0.000 4 d1 0.6324555 5 e1 1.2649111 6 a2 -1.2649111 7 b2 -0.6324555 8 c2 0.000 9 d2 0.6324555 10 e2 1.2649111 11 a3 -1.2649111 12 b3 -0.6324555 13 c3 0.000 14 d3 0.6324555 15 e3 1.2649111 Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with ddply to eliminate a for..loop
On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger roger@rothschild.com wrote: I created a small example to show something that I do a lot of. scale data by month and return a data.frame with the output. id represents repeated observations over time and I want to scale the slope variable. The out variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply seems ideal, but despite playing with the baseball examples quite a bit I can't figure out how to get it to work with my sample dataset. TIA for any help, Roger Here is the sample code: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat for (i in 1:3) { mat - dat[dat$time==i, ] outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) if (i==1) { out - outi } else { out - rbind(out, outi) } } out Here is the sample output: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat id time slope 1 a 1 1 2 b 1 2 3 c 1 3 4 d 1 4 5 e 1 5 6 a 2 6 7 b 2 7 8 c 2 8 9 d 2 9 10 e 2 10 11 a 3 11 12 b 3 12 13 c 3 13 14 d 3 14 15 e 3 15 for (i in 1:3) { + mat - dat[dat$time==i, ] + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out [TRUNCATED] out mat.time mat.id slope 1 1 a -1.2649111 2 1 b -0.6324555 3 1 c 0.000 4 1 d 0.6324555 5 1 e 1.2649111 6 2 a -1.2649111 7 2 b -0.6324555 8 2 c 0.000 9 2 d 0.6324555 10 2 e 1.2649111 11 3 a -1.2649111 12 3 b -0.6324555 13 3 c 0.000 14 3 d 0.6324555 15 3 e 1.2649111 Try ave: transform(dat, slope = ave(slope, time, FUN = scale)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with ddply to eliminate a for..loop
A ddply solution is dat.out - ddply(dat, .(time), transform, slope = scale(slope)) but this is not faster than the loop, and slower than the ave() solution: system.time( + for (i in 1:3) { +mat - dat[dat$time==i, ] +outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) +if (i==1) { +out - outi +} else { +out - rbind(out, outi) +} + } + ) user system elapsed 0.024 0.000 0.025 system.time( + dat.out - ddply(dat, .(time), transform, slope = scale(slope)) + ) user system elapsed 0.032 0.000 0.031 system.time( + cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale)) + ) user system elapsed 0.008 0.000 0.007 On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger roger@rothschild.comwrote: I created a small example to show something that I do a lot of. scale data by month and return a data.frame with the output. id represents repeated observations over time and I want to scale the slope variable. The out variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply seems ideal, but despite playing with the baseball examples quite a bit I can't figure out how to get it to work with my sample dataset. TIA for any help, Roger Here is the sample code: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat for (i in 1:3) { mat - dat[dat$time==i, ] outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) if (i==1) { out - outi } else { out - rbind(out, outi) } } out Here is the sample output: dat - data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat id time slope 1 a1 1 2 b1 2 3 c1 3 4 d1 4 5 e1 5 6 a2 6 7 b2 7 8 c2 8 9 d2 9 10 e210 11 a311 12 b312 13 c313 14 d314 15 e315 for (i in 1:3) { + mat - dat[dat$time==i, ] + outi - data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out [TRUNCATED] out mat.time mat.id slope 1 1 a -1.2649111 2 1 b -0.6324555 3 1 c 0.000 4 1 d 0.6324555 5 1 e 1.2649111 6 2 a -1.2649111 7 2 b -0.6324555 8 2 c 0.000 9 2 d 0.6324555 102 e 1.2649111 113 a -1.2649111 123 b -0.6324555 133 c 0.000 143 d 0.6324555 153 e 1.2649111 *** This message is for the named person's use only. It ma...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.