[R] speed up process
Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar - 2 seq.yvar - 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos - c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k - seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){ tsts - tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter - signif(tsts$coef[1], digits=3) tsts_slope - signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar - 2 seq.yvar - 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos - c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k - seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){ tsts - tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter - signif(tsts$coef[1], digits=3) tsts_slope - signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){ tsts- tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter- signif(tsts$coef[1], digits=3) tsts_slope- signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){ tsts- tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter- signif(tsts$coef[1], digits=3) tsts_slope- signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman
Re: [R] speed up process
Dear Jim, I've tried to use Rprof() as you advised me, but I don't understand how it works. I've done this: Rprof(for (i in seq_along(seq.yvar)){ all_my_commands }) summaryRprof() But I got this error: Error in summaryRprof() : no lines found in ‘Rprof.out’ I couldn't really understand from the help page what I should do. In any case, it's sure that the function tstsreg(), is what takes the most computing time. But I wanted to optimize the rest of the code to gain as much speed as possible. Ivan Le 2/25/2011 12:30, Jim Holtman a écrit : use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){ tsts- tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter- signif(tsts$coef[1], digits=3) tsts_slope- signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col
Re: [R] speed up process
You invoke Rprof, run your code and then terminate it: Rprof() ... code you want to profile Rprof(NULL) # generate output summaryRprof() example: Rprof() for (i in 1:1e6) sin(i) + cos(i) + sqrt(i) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct sin 0.2430.77 0.24 30.77 sqrt 0.2228.21 0.22 28.21 cos 0.1620.51 0.16 20.51 + 0.1417.95 0.14 17.95 : 0.02 2.56 0.02 2.56 $by.total total.time total.pct self.time self.pct sin0.24 30.77 0.2430.77 sqrt 0.22 28.21 0.2228.21 cos0.16 20.51 0.1620.51 + 0.14 17.95 0.1417.95 : 0.02 2.56 0.02 2.56 $sample.interval [1] 0.02 $sampling.time [1] 0.78 On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Dear Jim, I've tried to use Rprof() as you advised me, but I don't understand how it works. I've done this: Rprof(for (i in seq_along(seq.yvar)){ all_my_commands }) summaryRprof() But I got this error: Error in summaryRprof() : no lines found in ‘Rprof.out’ I couldn't really understand from the help page what I should do. In any case, it's sure that the function tstsreg(), is what takes the most computing time. But I wanted to optimize the rest of the code to gain as much speed as possible. Ivan Le 2/25/2011 12:30, Jim Holtman a écrit : use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb
Re: [R] speed up process
Ha... it was way too simple! I thought it would be like system.time()... my bad. Thanks for the tip! As we thought, foo_reg() takes most of the computing time, and I cannot improve that. Any ideas of how to improve the rest? Thanks again for your help Ivan Le 2/25/2011 14:29, jim holtman a écrit : You invoke Rprof, run your code and then terminate it: Rprof() ... code you want to profile Rprof(NULL) # generate output summaryRprof() example: Rprof() for (i in 1:1e6) sin(i) + cos(i) + sqrt(i) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct sin 0.2430.77 0.24 30.77 sqrt 0.2228.21 0.22 28.21 cos 0.1620.51 0.16 20.51 + 0.1417.95 0.14 17.95 : 0.02 2.56 0.02 2.56 $by.total total.time total.pct self.time self.pct sin0.24 30.77 0.2430.77 sqrt 0.22 28.21 0.2228.21 cos0.16 20.51 0.1620.51 + 0.14 17.95 0.1417.95 : 0.02 2.56 0.02 2.56 $sample.interval [1] 0.02 $sampling.time [1] 0.78 On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Dear Jim, I've tried to use Rprof() as you advised me, but I don't understand how it works. I've done this: Rprof(for (i in seq_along(seq.yvar)){ all_my_commands }) summaryRprof() But I got this error: Error in summaryRprof() : no lines found in ‘Rprof.out’ I couldn't really understand from the help page what I should do. In any case, it's sure that the function tstsreg(), is what takes the most computing time. But I wanted to optimize the rest of the code to gain as much speed as possible. Ivan Le 2/25/2011 12:30, Jim Holtman a écrit : use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar