[R] Tricky vectorization problem

2007-10-06 Thread Mike Lawrence
Hi all,

I'm using the code below within a loop that I run thousands of times  
and even with the super-computing resources at my disposal this is  
just too slow. The snippet below takes about 10s on my machines,  
which is an order of magnitude or two slower than would be  
preferable; in the end I'd like to set the number of monte carlo  
experiments to 1e4 or even 1e5 to ensure stable results. I've tried  
my hand at vectorizing it myself but I'm finding it tricky. Any help  
would be greatly appreciated!

This code attempts to find the distribution of observed correlation  
values between A & B when each are measured imperfectly:

start = proc.time()[1] #start a timer

rho = .5 #set the true correllation

Ns = 100 #set the number of Subjects

a.No = 1:100 #for each subject, set a number of observations in A  
(1:100 chosen arbitrarily for demonstration)
b.No = 1:100 #for each subject, set a number of observations in B

v.a = 1 #set the between Ss variance in condition A
v.b = 2 #set the between Ss variance in condition B

s.a.w = 1:100 #for each subject, set a standard deviation in A (1:100  
chosen arbitrarily for demonstration)
s.b.w = 1:100 #for each subject, set a standard deviation in B

mce = 1e3 #set the number of monte carlo experiments

sim.r = vector(length=mce) #set up a collector for the simulated  
correlations

Sigma=matrix(c(v.a,sqrt(v.a*v.b)*rho,sqrt(v.a*v.b)*rho,v.b),2,2)  
#define a covariance matrix for use in generating the correlated data

a = vector(length=Ns) #set up a collector for subject means in A
b = vector(length=Ns) #set up a collector for subject means in B

for(i in 1:mce){ #start a monte carlo experiment

sub.means=mvrnorm(Ns,rep(0,2),Sigma) #generate correlated ideal  
means for for each subject in each condition

for(s in 1:Ns){ #for each subject

a[s] = mean(rnorm(a.No[s],sub.means[s,1],s.a.w[s])) #generate 
some  
data for A and grab the mean
b[s] = mean(rnorm(b.No[s],sub.means[s,2],s.b.w[s])) #generate 
some  
data for B and grab the mean

}

sim.r[i] = cor(a,b) #store the observed correlation between A and B

}

print(proc.time()[1]-start) #show the total time this took



--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

"The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less."
- Piet Hein

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tricky vectorization problem

2007-10-06 Thread Mike Lawrence
Seems there were some line break issues when pasting the code, trying  
again with a different commenting style:

#start a timer
start = proc.time()[1]

#set the true correllation
rho = .5

#set the number of Subjects
Ns = 100

#for each subject, set a number of observations in A
a.No = 1:100
#for each subject, set a number of observations in B
b.No = 1:100

#set the between Ss variance in condition A
v.a = 1
#set the between Ss variance in condition B
v.b = 2

#for each subject, set a standard deviation in A
s.a.w = 1:100
#for each subject, set a standard deviation in B
s.b.w = 1:100

#set the number of monte carlo experiments
mce = 1e3

#set up a collector for the simulated correlations
sim.r = vector(length=mce)

#define a covariance matrix for use in generating the correlated data
Sigma=matrix(c(v.a,sqrt(v.a*v.b)*rho,sqrt(v.a*v.b)*rho,v.b),2,2)

#set up a collector for subject means in A
a = vector(length=Ns)
#set up a collector for subject means in B
b = vector(length=Ns)

#start a monte carlo experiment
for(i in 1:mce){

#generate correlated ideal means for for each subject in each condition
sub.means=mvrnorm(Ns,rep(0,2),Sigma)

#for each subject
for(s in 1:Ns){

#generate some data for A and grab the mean
a[s] = mean(rnorm(a.No[s],sub.means[s,1],s.a.w[s]))
#generate some data for B and grab the mean
b[s] = mean(rnorm(b.No[s],sub.means[s,2],s.b.w[s]))

}

#store the observed correlation between A and B
sim.r[i] = cor(a,b)

}

#show the total time this took
print(proc.time()[1]-start)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tricky vectorization problem

2007-10-07 Thread Dimitris Rizopoulos
a small improvement is to avoid computing the spectral decomposition  
of `Sigma' (look at code of mvrnorm()) in each iteration, e.g.,

#set up a collector for subject means in A
a <- numeric(Ns)
#set up a collector for subject means in B
b <- numeric(Ns)

eS <- eigen(Sigma, symmetric = TRUE, EISPACK = TRUE)
ev <- eS$values
fact <- eS$vectors %*% diag(sqrt(pmax(ev, 0)), 2)

#start a monte carlo experiment
for (i in 1:mce) {
 #generate correlated ideal means for each subject in each condition
 X <- rnorm(2 * Ns)
 dim(X) <- c(2, Ns)
 sub.means <- t(fact %*% X)

 #for each subject
 for (s in 1:Ns) {
 #generate some data for A and grab the mean
 a[s] <- mean(rnorm(a.No[s], sub.means[s, 1], s.a.w[s]))
 #generate some data for B and grab the mean
 b[s] <- mean(rnorm(b.No[s], sub.means[s, 2], s.b.w[s]))
 }
 #store the observed correlation between A and B
 sim.r[i] <- cor(a, b)
}


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
  http://www.student.kuleuven.be/~m0390867/dimitris.htm


Quoting Mike Lawrence <[EMAIL PROTECTED]>:

> Seems there were some line break issues when pasting the code, trying
> again with a different commenting style:
>
> #start a timer
> start = proc.time()[1]
>
> #set the true correllation
> rho = .5
>
> #set the number of Subjects
> Ns = 100
>
> #for each subject, set a number of observations in A
> a.No = 1:100
> #for each subject, set a number of observations in B
> b.No = 1:100
>
> #set the between Ss variance in condition A
> v.a = 1
> #set the between Ss variance in condition B
> v.b = 2
>
> #for each subject, set a standard deviation in A
> s.a.w = 1:100
> #for each subject, set a standard deviation in B
> s.b.w = 1:100
>
> #set the number of monte carlo experiments
> mce = 1e3
>
> #set up a collector for the simulated correlations
> sim.r = vector(length=mce)
>
> #define a covariance matrix for use in generating the correlated data
> Sigma=matrix(c(v.a,sqrt(v.a*v.b)*rho,sqrt(v.a*v.b)*rho,v.b),2,2)
>
> #set up a collector for subject means in A
> a = vector(length=Ns)
> #set up a collector for subject means in B
> b = vector(length=Ns)
>
> #start a monte carlo experiment
> for(i in 1:mce){
>
>   #generate correlated ideal means for for each subject in each condition
>   sub.means=mvrnorm(Ns,rep(0,2),Sigma)
>
>   #for each subject
>   for(s in 1:Ns){
>
>   #generate some data for A and grab the mean
>   a[s] = mean(rnorm(a.No[s],sub.means[s,1],s.a.w[s]))
>   #generate some data for B and grab the mean
>   b[s] = mean(rnorm(b.No[s],sub.means[s,2],s.b.w[s]))
>
>   }
>
>   #store the observed correlation between A and B
>   sim.r[i] = cor(a,b)
>
> }
>
> #show the total time this took
> print(proc.time()[1]-start)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tricky vectorization problem

2007-10-07 Thread Mike Lawrence
Had a realization this morning that I think achieves ceiling  
performance and thought I'd share with the list. I was previously  
generating sample data per participant and then calculating a mean,  
but of course this could be simplified by simply getting a single  
value from the sampling distribution of the mean for that  
participant. This speeds things up immensely (from 10.5 seconds to .5  
seconds) and should be sufficient for my needs, but I'd still welcome  
further improvement suggestions.


#start a timer
start = proc.time()[1]

#set the number of monte carlo experiments
mce = 1e3

#set the true correllation
rho = .5

#set the number of Subjects
Ns = 100

#for each subject, set a number of observations in A
a.No = 1:100
#for each subject, set a number of observations in B
b.No = 1:100

#set the between Ss variance in condition A
v.a = 1
#set the between Ss variance in condition B
v.b = 2

#for each subject, set a standard deviation in A
s.a.w = 1:100
#for each subject, set a standard deviation in B
s.b.w = 1:100

#set up a collector for the simulated correlations
sim.r = vector(length=mce)

#define a covariance matrix for use in generating the correlated data
Sigma=matrix(c(v.a,sqrt(v.a*v.b)*rho,sqrt(v.a*v.b)*rho,v.b),2,2)
eS <- eigen(Sigma, symmetric = TRUE, EISPACK = TRUE)
ev <- eS$values
fact <- eS$vectors %*% diag(sqrt(pmax(ev, 0)), 2)

#set up a collector for subject means in A
a = vector(length=Ns)
#set up a collector for subject means in B
b = vector(length=Ns)


#generate correlated ideal means for for each subject in each condition
X <- rnorm(2 * Ns * mce)
dim(X) <- c(2, Ns * mce)
sub.means <- t(fact %*% X)

#generate observed means for each Subject in A
a=sub.means[,1]+rnorm(Ns*mce,0,s.a.w/sqrt(a.No))

#generate observed means for each Subject in B
b=sub.means[,2]+rnorm(Ns*mce,0,s.b.w/sqrt(b.No))

#Get the observed correlation for each monte carlo experiment
for(i in 1:mce){
sim.r[i] = cor(
a[(i*Ns-Ns+1):(i*Ns)]
,b[(i*Ns-Ns+1):(i*Ns)]
)
}

#show the total time this took
print(proc.time()[1]-start)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tricky vectorization problem

2007-10-07 Thread Mike Lawrence
Posting this for posterity and to demonstrate that a speed-up of 2  
orders of magnitude is indeed possible! I can now run 1e5 monte carlo  
experiments in the time the old code could only run 1e3. The final  
change was to replace my use of cor() with .Internal(cor()), which  
avoids some checks that are unnecessary in this case:


#start a timer
start = proc.time()[1]

#set the number of monte carlo experiments
mce = 1e5

#set the true correllation
rho = 1

#set the number of Subjects
Ns = 100

#for each subject, set a number of observations in A
a.No = 1:100
#for each subject, set a number of observations in B
b.No = 1:100

#set the between Ss variance in condition A
v.a = 1
#set the between Ss variance in condition B
v.b = 2

#for each subject, set a standard deviation in A
s.a.w = 1:100
#for each subject, set a standard deviation in B
s.b.w = 1:100

#set up a collector for the simulated correlations
sim.r = vector(length=mce)

#define a covariance matrix for use in generating the correlated data
Sigma=matrix(c(v.a,sqrt(v.a*v.b)*rho,sqrt(v.a*v.b)*rho,v.b),2,2)
eS <- eigen(Sigma, symmetric = TRUE, EISPACK = TRUE)
ev <- eS$values
fact <- eS$vectors %*% diag(sqrt(pmax(ev, 0)), 2)

#set up a collector for subject means in A
a = vector(length=Ns)
#set up a collector for subject means in B
b = vector(length=Ns)


#generate correlated ideal means for for each subject in each condition
X <- rnorm(2 * Ns * mce)
dim(X) <- c(2, Ns * mce)
sub.means <- t(fact %*% X)

a=sub.means[,1]+rnorm(Ns*mce,0,s.a.w/sqrt(a.No))
b=sub.means[,2]+rnorm(Ns*mce,0,s.b.w/sqrt(b.No))

for(i in 1:mce){
end=i*Ns
sim.r[i] = .Internal(
cor(
a[(end-Ns+1):end]
,b[(end-Ns+1):end]
,TRUE
,FALSE
)
)
}

#show the total time this took
print(proc.time()[1]-start)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.