[R] function optimization: reducing the computing time
Dear useRs, I have written a function that implements a Bayesian method to compare a patient's score on two tasks with that of a small control group, as described in Crawford, J. and Garthwaite, P. (2007). Comparison of a single case to a control or normative sample in neuropsychology: Development of a bayesian approach. Cognitive Neuropsychology, 24(4):343–372. The function (see below) return the expected results, but the time needed to compute is quite long (at least for a test that may be routinely used). There is certainly room for improvement. It would really be helpful if some experts of you may have a look ... Thanks a lot. Regards, Matthieu FUNCTION -- The function takes the performance on two tasks and estimate the rarity (the p-value) of the difference between the patient's two scores, in comparison to the difference i the controls subjects. A standardized and an unstandardized version are provided (controlled by the parameter standardized: T vs. F). Also, for congruency with the original publication, both the raw data and summary statistics could be used for the control group. ## # Bayesian (un)Standardized Difference Test ## #from Crawford and Garthwaite (2007) Cognitive Neuropsychology # implemented by Matthieu Dubois, Matthieu.Duboispsp.ucl.ac.be #PACKAGE MCMCpack REQUIRED # patient: a vector with the two scores; controls: matrix/data.frame with the raw scores (one column per task) # mean.c, sd.c, r, n: possibility to enter summaries statistics (mean, standard deviation, correlation, group size) # n.simul: number of simulations # two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian Credible interval # standardized (Boolean): standardized (T) vs. unstandardized (F) test # values are: $p.value (one_tailed), $confidence.interval crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0, n=0, na.rm=F, n.simul=10, two.sided=T, standardized=T) { library(MCMCpack) #if no summaries are entered, they are computed if(missing(n)) { if(!is.data.frame(controls)) controls <- as.data.frame(controls) n <- dim(controls)[1] mean.c <- mean(controls, na.rm=na.rm) sd.c <- sd(controls, na.rm=na.rm) na.method <- ifelse(na.rm,"complete.obs","all.obs") r <- cor(controls[,1], controls[,2], na.method) } #variance/covariance matrix s.xx <- (sd.c[1]^2) * (n-1) s.yy <- (sd.c[2]^2) * (n-1) s.xy <- sd.c[1] * sd.c[2] * r * (n-1) A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2) #estimation function if(standardized) { estimation <- function(patient, mean.c, n, A) { #estimation of a variance/covariance matrix (sigma) sigma = riwish(n,A) #random obs. from an inverse-Wishart distribution #estimation of the means (mu) z <- rnorm(2) T <- t(chol(sigma)) #Cholesky decomposition mu <- mean.c + T %*% z/sqrt(n) #standardization z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1]) z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2]) rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2]) z.star <- (z.x - z.y) / sqrt(2-2*rho.xy) #conditional p-value p <- pnorm(z.star) p } } else { estimation <- function(patient, mean.c, n, A) { #estimation of a variance/covariance matrix (sigma) sigma = riwish(n,A) #random obs. from an inverse-Wishart distribution #estimation of the means (mu) z <- rnorm(2) T <- t(chol(sigma)) #Cholesky decomposition mu <- mean.c + T %*% z/sqrt(n) num <- (patient[1]-mu[1]) - (patient[2] - mu[2]) denom <- sqrt(sigma[1,1]+sigma[2,2]-(2*sigma[1,2])) z.star <- num/denom #conditional p-value p <- pnorm(z.star) p } } #application p <- replicate(n.simul, estimation(patient, mean.c, n, A))
[R] error in which(): recursive default argument reference
Dear useRs, I have written a very simple function to compute some probabilities on words (function is below). The function includes a which() statement applied to a vector of characters (word.split): sapply (word.split, function(x) which(letters==x)). This statement worked as expected when used outside the global function : > word <- "hello" > (word.split <- unlist(strsplit(word, split=c( [1] "h" "e" "l" "l" "o" > (used.letters <- sapply(word.split, function(x) which(letters==x))) h e l l o 8 5 12 12 15 However, when embedded in the fonction, the following error occurred : > compute.word("hello") Error in which(letters == x) : recursive default argument reference Any insights would be helpful ... Cheers, Matthieu Function : #computes some probabilities based on (1) word frequency (word.freq), letters visibility(letters.vis), letters frequency(letters.freq) and viewing eccentricity (pos) compute.word <- function(word, pos=X, word.freq=word.freq, letters=letters, letters.vis=letters.vis, letters.freq=letters.freq) { word.split <- unlist(strsplit(word, split=c())) #indexing vectors used.letters <- sapply(word.split, function(x) which(letters==x)) pos <- c(1,rep(2,(length(word.split)-2)),3) letters.freq.word <- unlist(letters.freq[cbind(used.letters, pos)]) letters.vis.word <- letters.vis[paste("X", pos, sep=""),] #computations Mean.Lvis <- mean(letters.vis.word) Prod.Lvis <- prod(letters.vis.word) PosRel <- sum((word.freq/letters.freq.word)*(letters.vis.word)) output <- list(Mean.Lvis, Prod.Lvis, PosRel) names(output) <- c("Mean.Lvis", "Prod.Lvis", "PosRel") return(output) } Matthieu Dubois, PH.D. Student Cognitive Neuroscience Unit Université catholique de Louvain 10, Place cardinal Mercier - 1348 Louvain-la-Neuve - BELGIUM [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] warning message in hand-made function
Dear Rusers, I tried to implement a function comparing mean scores between one subject (the patient) and a group a control subjects. The function returns attended results, but I also obtained the following warning : Warning message: the condition has length > 1 and only the first element will be used in: if (substr(fp, 1, 1) == "<") fp else paste("=", fp) Maybe the cause of the message is obvious, but I don't understand. I am newbie in R and certainly I missed something. Any help would be greatly appreciated. The aim of the function was to : 1. compute a modified t-test with either raw data (controls and patient) in vectors or only summaries for the control group (mean, standard deviation, size of the group : mean.c, sd.c, n) as inputs ; 2. estimate the rarity of the difference observed between patient and controls and computing confidence intervals The function was the following: crawford.t.test <- function(patient, controls, mean.c=0, sd.c=0, n=0, na.rm=F) { #from Crawford et al. (1998, Clinical Neuropsychologist ; 2002, Neuropsychologia) na<-na.rm #if no summaries are entered, they are computed if(missing(n)) { n <- length(controls) mean.c <- mean(controls, na.rm=na) sd.c <- sd(controls, na.rm=na) } dl <- n-1 #degrees of freedom of the test #t.test computation t.obs <- (patient-mean.c) / (sd.c*(((n+1)/n)^0.5)) proba.onetailed <- 1-pt(abs(t.obs), df=dl) rar <- pt(t.obs, df=dl) #point estimate of the rarity #confidence intervals computation on the rarity (Crawford & Garthwaite, 2002, Neuropsychologia) c <- (patient-mean.c)/sd.c #finding the non central parameter of t distributions f <- function(delta, pr, x, df) pt(x, df = df, ncp = delta) - pr deltaL <- uniroot(f, lower=-37.62, upper=37.62, pr = 0.025, x = c* (n^0.5), df = dl) deltaU <- uniroot(f, lower=-37.62, upper=37.62, pr = 0.975, x = c* (n^0.5), df = dl) CI.U <- pnorm(deltaL$root/(n^0.5)) * 100 #upper bound of the confidence interval CI.L <- pnorm(deltaU$root/(n^0.5)) * 100 #lower bound of the confidence interval #output output <- list(statistic=t.obs, p.value=c (one.tailed=proba.onetailed, twotailed=2*proba.onetailed), rarity=c (rarity=rar, lower.boud=CI.L, upper.bound=CI.U), df=dl, method=paste ("Crawford modified t test with", dl, "degrees of freedom", sep=" ")) class(output)<-"htest" return(output) } Matthieu Dubois, PH.D. Student Cognitive Neuroscience Unit Université catholique de Louvain 10, Place cardinal Mercier - 1348 Louvain-la-Neuve - BELGIUM [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] finding ncp for t distribution
Dear R-users, I am wondering whether R implements a function returning the non central parameter of a t distribution (equivalent of the TnoncT function from SAS), given /x/ a value from a t distribution, /df /the degrees of freedom and /p/ the probability of x under this distribution. Thanks a lot, Matthieu -- Matthieu Dubois, /Ph.D/. /Student/ Cognitive Neuroscience Unit, UCL, Belgium [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html