Re: [R] function optimization: reducing the computing time
You need to make use of the profiling methods described in 'Writing R Exensions'. My machine is about 4x faster than yours: I get Each sample represents 0.02 seconds. Total run time: 62.08041 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in function alone. % total % self totalseconds selfsecondsname 100.00 62.08 0.00 0.00 "system.time" 99.94 62.04 0.00 0.00 "crawford.BSDT" 99.94 62.04 0.00 0.00 "eval" 99.10 61.52 1.00 0.62 "lapply" 99.10 61.52 0.00 0.00 "sapply" 99.00 61.46 0.00 0.00 "replicate" 98.61 61.22 2.26 1.40 "FUN" 98.26 61.00 3.32 2.06 "estimation" 83.92 52.10 0.26 0.16 "riwish" 83.67 51.94 4.25 2.64 "solve" 55.57 34.50 7.18 4.46 "solve.default" 51.68 32.08 3.77 2.34 "rwish" ... so 84% of the time is being spent in riwish. Now given that A is fixed, you should be able to speed that up by precomputing the constant parts of the computation (and you can also precompute your 'T'). On Tue, 24 Jul 2007, Matthieu Dubois wrote: Dear useRs, I have written a function that implements a Bayesian method to compare a patient's score on two tasks with that of a small control group, as described in Crawford, J. and Garthwaite, P. (2007). Comparison of a single case to a control or normative sample in neuropsychology: Development of a bayesian approach. Cognitive Neuropsychology, 24(4):343ÿÿ372. The function (see below) return the expected results, but the time needed to compute is quite long (at least for a test that may be routinely used). There is certainly room for improvement. It would really be helpful if some experts of you may have a look ... Thanks a lot. Regards, Matthieu FUNCTION -- The function takes the performance on two tasks and estimate the rarity (the p-value) of the difference between the patient's two scores, in comparison to the difference i the controls subjects. A standardized and an unstandardized version are provided (controlled by the parameter standardized: T vs. F). Also, for congruency with the original publication, both the raw data and summary statistics could be used for the control group. ## # Bayesian (un)Standardized Difference Test ## #from Crawford and Garthwaite (2007) Cognitive Neuropsychology # implemented by Matthieu Dubois, Matthieu.Duboispsp.ucl.ac.be #PACKAGE MCMCpack REQUIRED # patient: a vector with the two scores; controls: matrix/data.frame with the raw scores (one column per task) # mean.c, sd.c, r, n: possibility to enter summaries statistics (mean, standard deviation, correlation, group size) # n.simul: number of simulations # two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian Credible interval # standardized (Boolean): standardized (T) vs. unstandardized (F) test # values are: $p.value (one_tailed), $confidence.interval crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0, n=0, na.rm=F, n.simul=10, two.sided=T, standardized=T) { library(MCMCpack) #if no summaries are entered, they are computed if(missing(n)) { if(!is.data.frame(controls)) controls <- as.data.frame(controls) n <- dim(controls)[1] mean.c <- mean(controls, na.rm=na.rm) sd.c <- sd(controls, na.rm=na.rm) na.method <- ifelse(na.rm,"complete.obs","all.obs") r <- cor(controls[,1], controls[,2], na.method) } #variance/covariance matrix s.xx <- (sd.c[1]^2) * (n-1) s.yy <- (sd.c[2]^2) * (n-1) s.xy <- sd.c[1] * sd.c[2] * r * (n-1) A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2) #estimation function if(standardized) { estimation <- function(patient, mean.c, n, A) { #estimation of a variance/covariance matrix (sigma) sigma = riwish(n,A) #random obs. from an inverse-Wishart distribution #estimation of the means (mu) z <- rnorm(2) T <- t(chol(sigma)) #Cholesky decomposition mu <- mean.c + T %*% z/sqrt(n) #standardization z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1]) z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2]) rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2]) z.star <- (z.x - z.y) / sqrt(2-2*rho.xy) #conditional p-value p <- pnorm(z.star) p } }
[R] function optimization: reducing the computing time
Dear useRs, I have written a function that implements a Bayesian method to compare a patient's score on two tasks with that of a small control group, as described in Crawford, J. and Garthwaite, P. (2007). Comparison of a single case to a control or normative sample in neuropsychology: Development of a bayesian approach. Cognitive Neuropsychology, 24(4):343–372. The function (see below) return the expected results, but the time needed to compute is quite long (at least for a test that may be routinely used). There is certainly room for improvement. It would really be helpful if some experts of you may have a look ... Thanks a lot. Regards, Matthieu FUNCTION -- The function takes the performance on two tasks and estimate the rarity (the p-value) of the difference between the patient's two scores, in comparison to the difference i the controls subjects. A standardized and an unstandardized version are provided (controlled by the parameter standardized: T vs. F). Also, for congruency with the original publication, both the raw data and summary statistics could be used for the control group. ## # Bayesian (un)Standardized Difference Test ## #from Crawford and Garthwaite (2007) Cognitive Neuropsychology # implemented by Matthieu Dubois, Matthieu.Duboispsp.ucl.ac.be #PACKAGE MCMCpack REQUIRED # patient: a vector with the two scores; controls: matrix/data.frame with the raw scores (one column per task) # mean.c, sd.c, r, n: possibility to enter summaries statistics (mean, standard deviation, correlation, group size) # n.simul: number of simulations # two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian Credible interval # standardized (Boolean): standardized (T) vs. unstandardized (F) test # values are: $p.value (one_tailed), $confidence.interval crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0, n=0, na.rm=F, n.simul=10, two.sided=T, standardized=T) { library(MCMCpack) #if no summaries are entered, they are computed if(missing(n)) { if(!is.data.frame(controls)) controls <- as.data.frame(controls) n <- dim(controls)[1] mean.c <- mean(controls, na.rm=na.rm) sd.c <- sd(controls, na.rm=na.rm) na.method <- ifelse(na.rm,"complete.obs","all.obs") r <- cor(controls[,1], controls[,2], na.method) } #variance/covariance matrix s.xx <- (sd.c[1]^2) * (n-1) s.yy <- (sd.c[2]^2) * (n-1) s.xy <- sd.c[1] * sd.c[2] * r * (n-1) A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2) #estimation function if(standardized) { estimation <- function(patient, mean.c, n, A) { #estimation of a variance/covariance matrix (sigma) sigma = riwish(n,A) #random obs. from an inverse-Wishart distribution #estimation of the means (mu) z <- rnorm(2) T <- t(chol(sigma)) #Cholesky decomposition mu <- mean.c + T %*% z/sqrt(n) #standardization z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1]) z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2]) rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2]) z.star <- (z.x - z.y) / sqrt(2-2*rho.xy) #conditional p-value p <- pnorm(z.star) p } } else { estimation <- function(patient, mean.c, n, A) { #estimation of a variance/covariance matrix (sigma) sigma = riwish(n,A) #random obs. from an inverse-Wishart distribution #estimation of the means (mu) z <- rnorm(2) T <- t(chol(sigma)) #Cholesky decomposition mu <- mean.c + T %*% z/sqrt(n) num <- (patient[1]-mu[1]) - (patient[2] - mu[2]) denom <- sqrt(sigma[1,1]+sigma[2,2]-(2*sigma[1,2])) z.star <- num/denom #conditional p-value p <- pnorm(z.star) p } } #application p <- replicate(n.simul, estimation(patient, mean.c, n, A)) #outputs pval <- mean(p) CI <- if(two.sided) 100*quantile(p,c(0.025,0.975)) else 100*quantil