Re: [R] function optimization: reducing the computing time

2007-07-24 Thread Prof Brian Ripley
You need to make use of the profiling methods described in 'Writing R 
Exensions'. My machine is about 4x faster than yours: I get


Each sample represents 0.02 seconds.
Total run time: 62.08041 seconds.

Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.

   %   total   %   self
 totalseconds selfsecondsname
100.00 62.08  0.00  0.00 "system.time"
 99.94 62.04  0.00  0.00 "crawford.BSDT"
 99.94 62.04  0.00  0.00 "eval"
 99.10 61.52  1.00  0.62 "lapply"
 99.10 61.52  0.00  0.00 "sapply"
 99.00 61.46  0.00  0.00 "replicate"
 98.61 61.22  2.26  1.40 "FUN"
 98.26 61.00  3.32  2.06 "estimation"
 83.92 52.10  0.26  0.16 "riwish"
 83.67 51.94  4.25  2.64 "solve"
 55.57 34.50  7.18  4.46 "solve.default"
 51.68 32.08  3.77  2.34 "rwish"
...

so 84% of the time is being spent in riwish.  Now given that A is fixed, 
you should be able to speed that up by precomputing the constant parts of 
the computation (and you can also precompute your 'T').



On Tue, 24 Jul 2007, Matthieu Dubois wrote:


Dear useRs,

I have written a function that implements a Bayesian method to
compare a patient's score on two tasks with that of a small control
group, as described in Crawford, J. and Garthwaite, P. (2007).
Comparison of a single case to a control or normative sample in
neuropsychology: Development of a bayesian approach. Cognitive
Neuropsychology, 24(4):343ÿÿ372.

The function (see below) return the expected results, but the time
needed to compute is quite long (at least for a test that may be
routinely used). There is certainly room for  improvement. It would
really be helpful if some experts of you may have  a  look ...

Thanks a lot.
Regards,

Matthieu


FUNCTION
--
The function takes the performance on two tasks  and estimate the
rarity (the p-value) of the difference between the patient's two
scores, in comparison to the difference i the  controls subjects. A
standardized and an unstandardized version are provided (controlled
by the parameter standardized: T vs. F). Also, for congruency with
the original publication, both the raw data  and  summary statistics
could be used for the control group.

##
# Bayesian (un)Standardized Difference Test
##

#from Crawford and Garthwaite (2007) Cognitive Neuropsychology
# implemented by Matthieu Dubois, Matthieu.Duboispsp.ucl.ac.be

#PACKAGE MCMCpack REQUIRED

# patient: a vector with the two scores; controls: matrix/data.frame
with the raw scores (one column per  task)
# mean.c, sd.c, r, n: possibility to enter summaries statistics
(mean, standard deviation, correlation, group size)
# n.simul: number of simulations
# two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian
Credible interval
# standardized (Boolean): standardized (T) vs. unstandardized (F) test
# values are: $p.value (one_tailed), $confidence.interval

crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0,
n=0, na.rm=F, n.simul=10, two.sided=T, standardized=T)
{
library(MCMCpack)

#if no summaries are entered, they are computed
if(missing(n))
{
if(!is.data.frame(controls)) controls <- as.data.frame(controls)
n <- dim(controls)[1]
mean.c <- mean(controls, na.rm=na.rm)
sd.c <- sd(controls, na.rm=na.rm)

na.method <- ifelse(na.rm,"complete.obs","all.obs")

r <- cor(controls[,1], controls[,2], na.method)
}

#variance/covariance matrix
s.xx <- (sd.c[1]^2) * (n-1)
s.yy <- (sd.c[2]^2) * (n-1)
s.xy <- sd.c[1] * sd.c[2] * r * (n-1)

A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2)

#estimation function
if(standardized)
{
estimation <- function(patient, mean.c, n, A)
{
#estimation of a variance/covariance matrix (sigma)
sigma = riwish(n,A) #random obs. from an 
inverse-Wishart distribution

#estimation of the means (mu)
z <- rnorm(2)
T <- t(chol(sigma)) #Cholesky decomposition
mu <- mean.c + T %*% z/sqrt(n)

#standardization
z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1])
z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2])
rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2])

z.star <- (z.x - z.y) / sqrt(2-2*rho.xy)

#conditional p-value
p <- pnorm(z.star)
p
}
}
   

[R] function optimization: reducing the computing time

2007-07-24 Thread Matthieu Dubois
Dear useRs,

I have written a function that implements a Bayesian method to  
compare a patient's score on two tasks with that of a small control  
group, as described in Crawford, J. and Garthwaite, P. (2007).  
Comparison of a single case to a control or normative sample in  
neuropsychology: Development of a bayesian approach. Cognitive  
Neuropsychology, 24(4):343–372.

The function (see below) return the expected results, but the time  
needed to compute is quite long (at least for a test that may be  
routinely used). There is certainly room for  improvement. It would  
really be helpful if some experts of you may have  a  look ...

Thanks a lot.
Regards,

Matthieu


FUNCTION
--
The function takes the performance on two tasks  and estimate the  
rarity (the p-value) of the difference between the patient's two  
scores, in comparison to the difference i the  controls subjects. A  
standardized and an unstandardized version are provided (controlled  
by the parameter standardized: T vs. F). Also, for congruency with  
the original publication, both the raw data  and  summary statistics  
could be used for the control group.

##
# Bayesian (un)Standardized Difference Test
##

#from Crawford and Garthwaite (2007) Cognitive Neuropsychology
# implemented by Matthieu Dubois, Matthieu.Duboispsp.ucl.ac.be

#PACKAGE MCMCpack REQUIRED

# patient: a vector with the two scores; controls: matrix/data.frame  
with the raw scores (one column per  task)
# mean.c, sd.c, r, n: possibility to enter summaries statistics  
(mean, standard deviation, correlation, group size)
# n.simul: number of simulations
# two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian  
Credible interval
# standardized (Boolean): standardized (T) vs. unstandardized (F) test
# values are: $p.value (one_tailed), $confidence.interval

crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0,  
n=0, na.rm=F, n.simul=10, two.sided=T, standardized=T)
{
library(MCMCpack)

#if no summaries are entered, they are computed
if(missing(n))
{   
if(!is.data.frame(controls)) controls <- as.data.frame(controls)
n <- dim(controls)[1]
mean.c <- mean(controls, na.rm=na.rm)
sd.c <- sd(controls, na.rm=na.rm)

na.method <- ifelse(na.rm,"complete.obs","all.obs")

r <- cor(controls[,1], controls[,2], na.method)
}

#variance/covariance matrix
s.xx <- (sd.c[1]^2) * (n-1)
s.yy <- (sd.c[2]^2) * (n-1)
s.xy <- sd.c[1] * sd.c[2] * r * (n-1)

A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2)

#estimation function
if(standardized)
{
estimation <- function(patient, mean.c, n, A)
{
#estimation of a variance/covariance matrix (sigma)
sigma = riwish(n,A) #random obs. from an 
inverse-Wishart distribution

#estimation of the means (mu)
z <- rnorm(2)
T <- t(chol(sigma)) #Cholesky decomposition
mu <- mean.c + T %*% z/sqrt(n)

#standardization
z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1])
z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2])
rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2])

z.star <- (z.x - z.y) / sqrt(2-2*rho.xy)

#conditional p-value
p <- pnorm(z.star)
p
}
}
else
{
estimation <- function(patient, mean.c, n, A)
{
#estimation of a variance/covariance matrix (sigma)
sigma = riwish(n,A) #random obs. from an 
inverse-Wishart distribution

#estimation of the means (mu)
z <- rnorm(2)
T <- t(chol(sigma)) #Cholesky decomposition
mu <- mean.c + T %*% z/sqrt(n)

num <- (patient[1]-mu[1]) - (patient[2] - mu[2])
denom <- sqrt(sigma[1,1]+sigma[2,2]-(2*sigma[1,2]))

z.star <- num/denom

#conditional p-value
p <- pnorm(z.star)
p
}
}

#application
p <- replicate(n.simul, estimation(patient, mean.c, n, A))

#outputs
pval <- mean(p)
CI <- if(two.sided) 100*quantile(p,c(0.025,0.975)) else 100*quantil