[R] speed up process

2011-02-25 Thread Ivan Calandra

Dear users,

I have a double for loop that does exactly what I want, but is quite 
slow. It is not so much with this simplified example, but IRL it is slow.

Can anyone help me improve it?

The data and code for foo_reg() are available at the end of the email; I 
preferred going directly into the problematic part.
Here is the code (I tried to simplify it but I cannot do it too much or 
else it wouldn't represent my problem). It might also look too complex 
for what it is intended to do, but my colleagues who are also supposed 
to use it don't know much about R. So I wrote it so that they don't have 
to modify the critical parts to run the script for their needs.


#column indexes for function
ind.xvar - 2
seq.yvar - 3:4
#position vector for legend(), stupid positioning but it doesn't matter here
mypos - c(topleft, topright,bottomleft)

#run the function for columns 34 as y (seq.yvar) with column 2 as x 
(ind.xvar) for all 3 datasets (mydata_list)

par(mfrow=c(2,1))
for (i in seq_along(seq.yvar)){
  k - seq.yvar[i]
  plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, 
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])

  for (j in seq_along(mydata_list)){
foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, 
pos=mypos[j], name.dat=names(mydata_list)[j])

  }
}

I tried with lapply() or mapply() but couldn't manage to pass the 
arguments for names() and col= correctly, e.g. for the 2nd loop:
lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, 
yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, 
mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))


Thanks in advance for any hints.
Ivan




#create data (it looks horrible with these datasets but it doesn't 
matter here)
mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, 
gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = 
factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = 
c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 
119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 
43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 
50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = 
c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = 
data.frame)


mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),]
mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),]
mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)

#function for regression
library(WRS)
foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){
 tsts - tstsreg(dat[[xvar]], dat[[yvar]])
 tsts_inter - signif(tsts$coef[1], digits=3)
 tsts_slope - signif(tsts$coef[2], digits=3)
 abline(tsts$coef, lty=1, col=mycol)
 legend(x=pos, legend=c(paste(TSTS ,name.dat,: 
Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol)

}

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up process

2011-02-25 Thread Nick Sabbe
Simply avoiding the for loops by using lapply (I may have missed a bracket
here or there cause I did this without opening R)...
Haven't checked the speed up, though.

lapply(seq.yvar, function(k){
   plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
   lapply(seq_along(mydata_list), function(j){
 foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
 return(NULL)
   })
   invisible(NULL)
})

HTH,

Nick Sabbe
--
ping: nick.sa...@ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Ivan Calandra
Sent: vrijdag 25 februari 2011 11:20
To: r-help
Subject: [R] speed up process

Dear users,

I have a double for loop that does exactly what I want, but is quite 
slow. It is not so much with this simplified example, but IRL it is slow.
Can anyone help me improve it?

The data and code for foo_reg() are available at the end of the email; I 
preferred going directly into the problematic part.
Here is the code (I tried to simplify it but I cannot do it too much or 
else it wouldn't represent my problem). It might also look too complex 
for what it is intended to do, but my colleagues who are also supposed 
to use it don't know much about R. So I wrote it so that they don't have 
to modify the critical parts to run the script for their needs.

#column indexes for function
ind.xvar - 2
seq.yvar - 3:4
#position vector for legend(), stupid positioning but it doesn't matter here
mypos - c(topleft, topright,bottomleft)

#run the function for columns 34 as y (seq.yvar) with column 2 as x 
(ind.xvar) for all 3 datasets (mydata_list)
par(mfrow=c(2,1))
for (i in seq_along(seq.yvar)){
   k - seq.yvar[i]
   plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, 
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
   for (j in seq_along(mydata_list)){
 foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, 
pos=mypos[j], name.dat=names(mydata_list)[j])
   }
}

I tried with lapply() or mapply() but couldn't manage to pass the 
arguments for names() and col= correctly, e.g. for the 2nd loop:
lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, 
yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, 
mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))

Thanks in advance for any hints.
Ivan




#create data (it looks horrible with these datasets but it doesn't 
matter here)
mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, 
gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = 
factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = 
c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 
119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 
43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 
50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = 
c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = 
data.frame)

mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),]
mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),]
mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)

#function for regression
library(WRS)
foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){
  tsts - tstsreg(dat[[xvar]], dat[[yvar]])
  tsts_inter - signif(tsts$coef[1], digits=3)
  tsts_slope - signif(tsts$coef[2], digits=3)
  abline(tsts$coef, lty=1, col=mycol)
  legend(x=pos, legend=c(paste(TSTS ,name.dat,: 
Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol)
}

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up process

2011-02-25 Thread Ivan Calandra

Thanks Nick for your quick answer.
It does work (no missed bracket!) but unfortunately doesn't really speed 
up anything: with my real data, it takes 82.78 seconds with the double 
lapply() instead of 83.59s with the double loop (about 0.8 s).


It looks like my double loop was not that bad. Does anyone know another 
faster way to do this?


Thanks again in advance,
Ivan

Le 2/25/2011 11:41, Nick Sabbe a écrit :

Simply avoiding the for loops by using lapply (I may have missed a bracket
here or there cause I did this without opening R)...
Haven't checked the speed up, though.

lapply(seq.yvar, function(k){
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
lapply(seq_along(mydata_list), function(j){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
  return(NULL)
})
invisible(NULL)
})

HTH,

Nick Sabbe
--
ping: nick.sa...@ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Ivan Calandra
Sent: vrijdag 25 februari 2011 11:20
To: r-help
Subject: [R] speed up process

Dear users,

I have a double for loop that does exactly what I want, but is quite
slow. It is not so much with this simplified example, but IRL it is slow.
Can anyone help me improve it?

The data and code for foo_reg() are available at the end of the email; I
preferred going directly into the problematic part.
Here is the code (I tried to simplify it but I cannot do it too much or
else it wouldn't represent my problem). It might also look too complex
for what it is intended to do, but my colleagues who are also supposed
to use it don't know much about R. So I wrote it so that they don't have
to modify the critical parts to run the script for their needs.

#column indexes for function
ind.xvar- 2
seq.yvar- 3:4
#position vector for legend(), stupid positioning but it doesn't matter here
mypos- c(topleft, topright,bottomleft)

#run the function for columns 34 as y (seq.yvar) with column 2 as x
(ind.xvar) for all 3 datasets (mydata_list)
par(mfrow=c(2,1))
for (i in seq_along(seq.yvar)){
k- seq.yvar[i]
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
for (j in seq_along(mydata_list)){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
}
}

I tried with lapply() or mapply() but couldn't manage to pass the
arguments for names() and col= correctly, e.g. for the 2nd loop:
lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))

Thanks in advance for any hints.
Ivan




#create data (it looks horrible with these datasets but it doesn't
matter here)
mydata1- structure(list(species = structure(1:8, .Label = c(alsen,
gogor, loalb, mafas, pacyn, patro, poabe, thgel), class =
factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc =
c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809,
119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483,
43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651,
50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names =
c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class =
data.frame)

mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),]
mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),]
mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)

#function for regression
library(WRS)
foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){
   tsts- tstsreg(dat[[xvar]], dat[[yvar]])
   tsts_inter- signif(tsts$coef[1], digits=3)
   tsts_slope- signif(tsts$coef[2], digits=3)
   abline(tsts$coef, lty=1, col=mycol)
   legend(x=pos, legend=c(paste(TSTS ,name.dat,:
Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol)
}



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up process

2011-02-25 Thread Jim Holtman
use Rprof to find where time is being spent.  probably in 'plot' which might 
imply it is not the 'for' loop and therefore beyond your control.

Sent from my iPad

On Feb 25, 2011, at 6:19, Ivan Calandra ivan.calan...@uni-hamburg.de wrote:

 Thanks Nick for your quick answer.
 It does work (no missed bracket!) but unfortunately doesn't really speed up 
 anything: with my real data, it takes 82.78 seconds with the double lapply() 
 instead of 83.59s with the double loop (about 0.8 s).
 
 It looks like my double loop was not that bad. Does anyone know another 
 faster way to do this?
 
 Thanks again in advance,
 Ivan
 
 Le 2/25/2011 11:41, Nick Sabbe a écrit :
 Simply avoiding the for loops by using lapply (I may have missed a bracket
 here or there cause I did this without opening R)...
 Haven't checked the speed up, though.
 
 lapply(seq.yvar, function(k){
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
 xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
lapply(seq_along(mydata_list), function(j){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
 pos=mypos[j], name.dat=names(mydata_list)[j])
  return(NULL)
})
invisible(NULL)
 })
 
 HTH,
 
 Nick Sabbe
 --
 ping: nick.sa...@ugent.be
 link: http://biomath.ugent.be
 wink: A1.056, Coupure Links 653, 9000 Gent
 ring: 09/264.59.36
 
 -- Do Not Disapprove
 
 
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Ivan Calandra
 Sent: vrijdag 25 februari 2011 11:20
 To: r-help
 Subject: [R] speed up process
 
 Dear users,
 
 I have a double for loop that does exactly what I want, but is quite
 slow. It is not so much with this simplified example, but IRL it is slow.
 Can anyone help me improve it?
 
 The data and code for foo_reg() are available at the end of the email; I
 preferred going directly into the problematic part.
 Here is the code (I tried to simplify it but I cannot do it too much or
 else it wouldn't represent my problem). It might also look too complex
 for what it is intended to do, but my colleagues who are also supposed
 to use it don't know much about R. So I wrote it so that they don't have
 to modify the critical parts to run the script for their needs.
 
 #column indexes for function
 ind.xvar- 2
 seq.yvar- 3:4
 #position vector for legend(), stupid positioning but it doesn't matter here
 mypos- c(topleft, topright,bottomleft)
 
 #run the function for columns 34 as y (seq.yvar) with column 2 as x
 (ind.xvar) for all 3 datasets (mydata_list)
 par(mfrow=c(2,1))
 for (i in seq_along(seq.yvar)){
k- seq.yvar[i]
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
 xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
for (j in seq_along(mydata_list)){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
 pos=mypos[j], name.dat=names(mydata_list)[j])
}
 }
 
 I tried with lapply() or mapply() but couldn't manage to pass the
 arguments for names() and col= correctly, e.g. for the 2nd loop:
 lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
 yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
 mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
 mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))
 
 Thanks in advance for any hints.
 Ivan
 
 
 
 
 #create data (it looks horrible with these datasets but it doesn't
 matter here)
 mydata1- structure(list(species = structure(1:8, .Label = c(alsen,
 gogor, loalb, mafas, pacyn, patro, poabe, thgel), class =
 factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc =
 c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809,
 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483,
 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651,
 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names =
 c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class =
 data.frame)
 
 mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),]
 mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),]
 mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)
 
 #function for regression
 library(WRS)
 foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){
   tsts- tstsreg(dat[[xvar]], dat[[yvar]])
   tsts_inter- signif(tsts$coef[1], digits=3)
   tsts_slope- signif(tsts$coef[2], digits=3)
   abline(tsts$coef, lty=1, col=mycol)
   legend(x=pos, legend=c(paste(TSTS ,name.dat,:
 Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol)
 }
 
 
 -- 
 Ivan CALANDRA
 PhD Student
 University of Hamburg
 Biozentrum Grindel und Zoologisches Museum
 Abt. Säugetiere
 Martin-Luther-King-Platz 3
 D-20146 Hamburg, GERMANY
 +49(0)40 42838 6231
 ivan.calan...@uni-hamburg.de
 
 **
 http://www.for771.uni-bonn.de
 http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman

Re: [R] speed up process

2011-02-25 Thread Ivan Calandra

Dear Jim,

I've tried to use Rprof() as you advised me, but I don't understand how 
it works.

I've done this:
Rprof(for (i in seq_along(seq.yvar)){
  all_my_commands
})
summaryRprof()

But I got this error:
Error in summaryRprof() : no lines found in ‘Rprof.out’

I couldn't really understand from the help page what I should do.

In any case, it's sure that the function tstsreg(), is what takes the 
most computing time. But I wanted to optimize the rest of the code to 
gain as much speed as possible.


Ivan

Le 2/25/2011 12:30, Jim Holtman a écrit :

use Rprof to find where time is being spent.  probably in 'plot' which might 
imply it is not the 'for' loop and therefore beyond your control.

Sent from my iPad

On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de  wrote:


Thanks Nick for your quick answer.
It does work (no missed bracket!) but unfortunately doesn't really speed up 
anything: with my real data, it takes 82.78 seconds with the double lapply() 
instead of 83.59s with the double loop (about 0.8 s).

It looks like my double loop was not that bad. Does anyone know another faster 
way to do this?

Thanks again in advance,
Ivan

Le 2/25/2011 11:41, Nick Sabbe a écrit :

Simply avoiding the for loops by using lapply (I may have missed a bracket
here or there cause I did this without opening R)...
Haven't checked the speed up, though.

lapply(seq.yvar, function(k){
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
lapply(seq_along(mydata_list), function(j){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
  return(NULL)
})
invisible(NULL)
})

HTH,

Nick Sabbe
--
ping: nick.sa...@ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Ivan Calandra
Sent: vrijdag 25 februari 2011 11:20
To: r-help
Subject: [R] speed up process

Dear users,

I have a double for loop that does exactly what I want, but is quite
slow. It is not so much with this simplified example, but IRL it is slow.
Can anyone help me improve it?

The data and code for foo_reg() are available at the end of the email; I
preferred going directly into the problematic part.
Here is the code (I tried to simplify it but I cannot do it too much or
else it wouldn't represent my problem). It might also look too complex
for what it is intended to do, but my colleagues who are also supposed
to use it don't know much about R. So I wrote it so that they don't have
to modify the critical parts to run the script for their needs.

#column indexes for function
ind.xvar- 2
seq.yvar- 3:4
#position vector for legend(), stupid positioning but it doesn't matter here
mypos- c(topleft, topright,bottomleft)

#run the function for columns 34 as y (seq.yvar) with column 2 as x
(ind.xvar) for all 3 datasets (mydata_list)
par(mfrow=c(2,1))
for (i in seq_along(seq.yvar)){
k- seq.yvar[i]
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
for (j in seq_along(mydata_list)){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
}
}

I tried with lapply() or mapply() but couldn't manage to pass the
arguments for names() and col= correctly, e.g. for the 2nd loop:
lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))

Thanks in advance for any hints.
Ivan




#create data (it looks horrible with these datasets but it doesn't
matter here)
mydata1- structure(list(species = structure(1:8, .Label = c(alsen,
gogor, loalb, mafas, pacyn, patro, poabe, thgel), class =
factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc =
c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809,
119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483,
43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651,
50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names =
c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class =
data.frame)

mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),]
mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),]
mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)

#function for regression
library(WRS)
foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){
   tsts- tstsreg(dat[[xvar]], dat[[yvar]])
   tsts_inter- signif(tsts$coef[1], digits=3)
   tsts_slope- signif(tsts$coef[2], digits=3)
   abline(tsts$coef, lty=1, col=mycol)
   legend(x=pos, legend=c(paste(TSTS ,name.dat,:
Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col

Re: [R] speed up process

2011-02-25 Thread jim holtman
You invoke Rprof, run your code and then terminate it:


Rprof()
... code you want to profile
Rprof(NULL)  # generate output
summaryRprof()

example:


 Rprof()
 for (i in 1:1e6) sin(i) + cos(i) + sqrt(i)
 Rprof(NULL)
 summaryRprof()
$by.self
 self.time self.pct total.time total.pct
sin   0.2430.77   0.24 30.77
sqrt  0.2228.21   0.22 28.21
cos   0.1620.51   0.16 20.51
+ 0.1417.95   0.14 17.95
: 0.02 2.56   0.02  2.56

$by.total
 total.time total.pct self.time self.pct
sin0.24 30.77  0.2430.77
sqrt   0.22 28.21  0.2228.21
cos0.16 20.51  0.1620.51
+  0.14 17.95  0.1417.95
:  0.02  2.56  0.02 2.56

$sample.interval
[1] 0.02

$sampling.time
[1] 0.78


On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra
ivan.calan...@uni-hamburg.de wrote:
 Dear Jim,

 I've tried to use Rprof() as you advised me, but I don't understand how it
 works.
 I've done this:
 Rprof(for (i in seq_along(seq.yvar)){
  all_my_commands
 })
 summaryRprof()

 But I got this error:
 Error in summaryRprof() : no lines found in ‘Rprof.out’

 I couldn't really understand from the help page what I should do.

 In any case, it's sure that the function tstsreg(), is what takes the most
 computing time. But I wanted to optimize the rest of the code to gain as
 much speed as possible.

 Ivan

 Le 2/25/2011 12:30, Jim Holtman a écrit :

 use Rprof to find where time is being spent.  probably in 'plot' which
 might imply it is not the 'for' loop and therefore beyond your control.

 Sent from my iPad

 On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de
  wrote:

 Thanks Nick for your quick answer.
 It does work (no missed bracket!) but unfortunately doesn't really speed
 up anything: with my real data, it takes 82.78 seconds with the double
 lapply() instead of 83.59s with the double loop (about 0.8 s).

 It looks like my double loop was not that bad. Does anyone know another
 faster way to do this?

 Thanks again in advance,
 Ivan

 Le 2/25/2011 11:41, Nick Sabbe a écrit :

 Simply avoiding the for loops by using lapply (I may have missed a
 bracket
 here or there cause I did this without opening R)...
 Haven't checked the speed up, though.

 lapply(seq.yvar, function(k){
    plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
 xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
    lapply(seq_along(mydata_list), function(j){
      foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
 pos=mypos[j], name.dat=names(mydata_list)[j])
      return(NULL)
    })
    invisible(NULL)
 })

 HTH,

 Nick Sabbe
 --
 ping: nick.sa...@ugent.be
 link: http://biomath.ugent.be
 wink: A1.056, Coupure Links 653, 9000 Gent
 ring: 09/264.59.36

 -- Do Not Disapprove




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Ivan Calandra
 Sent: vrijdag 25 februari 2011 11:20
 To: r-help
 Subject: [R] speed up process

 Dear users,

 I have a double for loop that does exactly what I want, but is quite
 slow. It is not so much with this simplified example, but IRL it is
 slow.
 Can anyone help me improve it?

 The data and code for foo_reg() are available at the end of the email; I
 preferred going directly into the problematic part.
 Here is the code (I tried to simplify it but I cannot do it too much or
 else it wouldn't represent my problem). It might also look too complex
 for what it is intended to do, but my colleagues who are also supposed
 to use it don't know much about R. So I wrote it so that they don't have
 to modify the critical parts to run the script for their needs.

 #column indexes for function
 ind.xvar- 2
 seq.yvar- 3:4
 #position vector for legend(), stupid positioning but it doesn't matter
 here
 mypos- c(topleft, topright,bottomleft)

 #run the function for columns 34 as y (seq.yvar) with column 2 as x
 (ind.xvar) for all 3 datasets (mydata_list)
 par(mfrow=c(2,1))
 for (i in seq_along(seq.yvar)){
    k- seq.yvar[i]
    plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
 xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
    for (j in seq_along(mydata_list)){
      foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
 pos=mypos[j], name.dat=names(mydata_list)[j])
    }
 }

 I tried with lapply() or mapply() but couldn't manage to pass the
 arguments for names() and col= correctly, e.g. for the 2nd loop:
 lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
 yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
 mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
 mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))

 Thanks in advance for any hints.
 Ivan




 #create data (it looks horrible with these datasets but it doesn't
 matter here)
 mydata1- structure(list(species = structure(1:8, .Label = c(alsen,
 gogor, loalb

Re: [R] speed up process

2011-02-25 Thread Ivan Calandra

Ha... it was way too simple!
I thought it would be like system.time()... my bad. Thanks for the tip!

As we thought, foo_reg() takes most of the computing time, and I cannot 
improve that.

Any ideas of how to improve the rest?

Thanks again for your help
Ivan


Le 2/25/2011 14:29, jim holtman a écrit :

You invoke Rprof, run your code and then terminate it:


Rprof()
... code you want to profile
Rprof(NULL)  # generate output
summaryRprof()

example:



Rprof()
for (i in 1:1e6) sin(i) + cos(i) + sqrt(i)
Rprof(NULL)
summaryRprof()

$by.self
  self.time self.pct total.time total.pct
sin   0.2430.77   0.24 30.77
sqrt  0.2228.21   0.22 28.21
cos   0.1620.51   0.16 20.51
+ 0.1417.95   0.14 17.95
: 0.02 2.56   0.02  2.56

$by.total
  total.time total.pct self.time self.pct
sin0.24 30.77  0.2430.77
sqrt   0.22 28.21  0.2228.21
cos0.16 20.51  0.1620.51
+  0.14 17.95  0.1417.95
:  0.02  2.56  0.02 2.56

$sample.interval
[1] 0.02

$sampling.time
[1] 0.78


On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra
ivan.calan...@uni-hamburg.de  wrote:

Dear Jim,

I've tried to use Rprof() as you advised me, but I don't understand how it
works.
I've done this:
Rprof(for (i in seq_along(seq.yvar)){
  all_my_commands
})
summaryRprof()

But I got this error:
Error in summaryRprof() : no lines found in ‘Rprof.out’

I couldn't really understand from the help page what I should do.

In any case, it's sure that the function tstsreg(), is what takes the most
computing time. But I wanted to optimize the rest of the code to gain as
much speed as possible.

Ivan

Le 2/25/2011 12:30, Jim Holtman a écrit :

use Rprof to find where time is being spent.  probably in 'plot' which
might imply it is not the 'for' loop and therefore beyond your control.

Sent from my iPad

On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de
  wrote:


Thanks Nick for your quick answer.
It does work (no missed bracket!) but unfortunately doesn't really speed
up anything: with my real data, it takes 82.78 seconds with the double
lapply() instead of 83.59s with the double loop (about 0.8 s).

It looks like my double loop was not that bad. Does anyone know another
faster way to do this?

Thanks again in advance,
Ivan

Le 2/25/2011 11:41, Nick Sabbe a écrit :

Simply avoiding the for loops by using lapply (I may have missed a
bracket
here or there cause I did this without opening R)...
Haven't checked the speed up, though.

lapply(seq.yvar, function(k){
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
lapply(seq_along(mydata_list), function(j){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
  return(NULL)
})
invisible(NULL)
})

HTH,

Nick Sabbe
--
ping: nick.sa...@ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
Behalf Of Ivan Calandra
Sent: vrijdag 25 februari 2011 11:20
To: r-help
Subject: [R] speed up process

Dear users,

I have a double for loop that does exactly what I want, but is quite
slow. It is not so much with this simplified example, but IRL it is
slow.
Can anyone help me improve it?

The data and code for foo_reg() are available at the end of the email; I
preferred going directly into the problematic part.
Here is the code (I tried to simplify it but I cannot do it too much or
else it wouldn't represent my problem). It might also look too complex
for what it is intended to do, but my colleagues who are also supposed
to use it don't know much about R. So I wrote it so that they don't have
to modify the critical parts to run the script for their needs.

#column indexes for function
ind.xvar- 2
seq.yvar- 3:4
#position vector for legend(), stupid positioning but it doesn't matter
here
mypos- c(topleft, topright,bottomleft)

#run the function for columns 34 as y (seq.yvar) with column 2 as x
(ind.xvar) for all 3 datasets (mydata_list)
par(mfrow=c(2,1))
for (i in seq_along(seq.yvar)){
k- seq.yvar[i]
plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p,
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
for (j in seq_along(mydata_list)){
  foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
}
}

I tried with lapply() or mapply() but couldn't manage to pass the
arguments for names() and col= correctly, e.g. for the 2nd loop:
lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar