[R] function optimization: reducing the computing time

2007-07-24 Thread Matthieu Dubois
Dear useRs,

I have written a function that implements a Bayesian method to  
compare a patient's score on two tasks with that of a small control  
group, as described in Crawford, J. and Garthwaite, P. (2007).  
Comparison of a single case to a control or normative sample in  
neuropsychology: Development of a bayesian approach. Cognitive  
Neuropsychology, 24(4):343–372.

The function (see below) return the expected results, but the time  
needed to compute is quite long (at least for a test that may be  
routinely used). There is certainly room for  improvement. It would  
really be helpful if some experts of you may have  a  look ...

Thanks a lot.
Regards,

Matthieu


FUNCTION
--
The function takes the performance on two tasks  and estimate the  
rarity (the p-value) of the difference between the patient's two  
scores, in comparison to the difference i the  controls subjects. A  
standardized and an unstandardized version are provided (controlled  
by the parameter standardized: T vs. F). Also, for congruency with  
the original publication, both the raw data  and  summary statistics  
could be used for the control group.

##
# Bayesian (un)Standardized Difference Test
##

#from Crawford and Garthwaite (2007) Cognitive Neuropsychology
# implemented by Matthieu Dubois, Matthieu.Duboispsp.ucl.ac.be

#PACKAGE MCMCpack REQUIRED

# patient: a vector with the two scores; controls: matrix/data.frame  
with the raw scores (one column per  task)
# mean.c, sd.c, r, n: possibility to enter summaries statistics  
(mean, standard deviation, correlation, group size)
# n.simul: number of simulations
# two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian  
Credible interval
# standardized (Boolean): standardized (T) vs. unstandardized (F) test
# values are: $p.value (one_tailed), $confidence.interval

crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0,  
n=0, na.rm=F, n.simul=10, two.sided=T, standardized=T)
{
library(MCMCpack)

#if no summaries are entered, they are computed
if(missing(n))
{   
if(!is.data.frame(controls)) controls <- as.data.frame(controls)
n <- dim(controls)[1]
mean.c <- mean(controls, na.rm=na.rm)
sd.c <- sd(controls, na.rm=na.rm)

na.method <- ifelse(na.rm,"complete.obs","all.obs")

r <- cor(controls[,1], controls[,2], na.method)
}

#variance/covariance matrix
s.xx <- (sd.c[1]^2) * (n-1)
s.yy <- (sd.c[2]^2) * (n-1)
s.xy <- sd.c[1] * sd.c[2] * r * (n-1)

A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2)

#estimation function
if(standardized)
{
estimation <- function(patient, mean.c, n, A)
{
#estimation of a variance/covariance matrix (sigma)
sigma = riwish(n,A) #random obs. from an 
inverse-Wishart distribution

#estimation of the means (mu)
z <- rnorm(2)
T <- t(chol(sigma)) #Cholesky decomposition
mu <- mean.c + T %*% z/sqrt(n)

#standardization
z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1])
z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2])
rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2])

z.star <- (z.x - z.y) / sqrt(2-2*rho.xy)

#conditional p-value
p <- pnorm(z.star)
p
}
}
else
{
estimation <- function(patient, mean.c, n, A)
{
#estimation of a variance/covariance matrix (sigma)
sigma = riwish(n,A) #random obs. from an 
inverse-Wishart distribution

#estimation of the means (mu)
z <- rnorm(2)
T <- t(chol(sigma)) #Cholesky decomposition
mu <- mean.c + T %*% z/sqrt(n)

num <- (patient[1]-mu[1]) - (patient[2] - mu[2])
denom <- sqrt(sigma[1,1]+sigma[2,2]-(2*sigma[1,2]))

z.star <- num/denom

#conditional p-value
p <- pnorm(z.star)
p
}
}

#application
p <- replicate(n.simul, estimation(patient, mean.c, n, A))

[R] error in which(): recursive default argument reference

2006-04-11 Thread Matthieu Dubois
Dear useRs,

I have written a very simple function to compute some probabilities  
on words (function is below). The function includes a which()  
statement applied to a vector of characters (word.split): sapply 
(word.split, function(x) which(letters==x)). This statement  worked  
as expected when used outside the global function :
 > word <- "hello"
 > (word.split <- unlist(strsplit(word, split=c(
[1] "h" "e" "l" "l" "o"
 > (used.letters <- sapply(word.split, function(x) which(letters==x)))
h  e  l  l  o
8  5 12 12 15

However, when embedded in the fonction, the following error occurred :
 >  compute.word("hello")
Error in which(letters == x) : recursive default argument reference


Any insights would be helpful ...

Cheers,

Matthieu

Function :
#computes some probabilities based on (1) word frequency (word.freq),  
letters visibility(letters.vis), letters frequency(letters.freq) and  
viewing eccentricity (pos)
compute.word <- function(word, pos=X, word.freq=word.freq,  
letters=letters, letters.vis=letters.vis, letters.freq=letters.freq)
{
word.split <- unlist(strsplit(word, split=c()))

#indexing vectors
used.letters <- sapply(word.split, function(x) 
which(letters==x))
pos <- c(1,rep(2,(length(word.split)-2)),3)

letters.freq.word <- unlist(letters.freq[cbind(used.letters, 
pos)])
letters.vis.word <- letters.vis[paste("X", pos, sep=""),]

#computations
Mean.Lvis <- mean(letters.vis.word)
Prod.Lvis <- prod(letters.vis.word)
PosRel <- sum((word.freq/letters.freq.word)*(letters.vis.word)) 
output <- list(Mean.Lvis, Prod.Lvis, PosRel)
names(output) <- c("Mean.Lvis", "Prod.Lvis", "PosRel")
return(output)
}   



Matthieu Dubois, PH.D. Student
Cognitive Neuroscience Unit
Université catholique de Louvain
10, Place cardinal Mercier - 1348 Louvain-la-Neuve - BELGIUM

[EMAIL PROTECTED]




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] warning message in hand-made function

2006-03-30 Thread Matthieu Dubois
Dear Rusers,

I  tried to implement a function comparing mean scores between one  
subject (the patient) and a group a control subjects. The function  
returns attended results, but  I also obtained the following warning :

Warning message:
the condition has length > 1 and only the first element will be used  
in: if (substr(fp, 1, 1) == "<") fp else paste("=", fp)

Maybe the cause of the message is obvious, but I don't understand. I  
am newbie in R  and certainly I missed something. Any help would be  
greatly appreciated.

The aim of the function was to :
1. compute a modified t-test with either raw data (controls and  
patient) in vectors or only summaries for the control group (mean,  
standard deviation, size of the group :  mean.c, sd.c, n) as inputs ;
2. estimate the rarity of the  difference observed between patient  
and controls and computing confidence intervals

The function was the following:

crawford.t.test <- function(patient, controls, mean.c=0, sd.c=0, n=0,  
na.rm=F) {
#from Crawford et al. (1998, Clinical Neuropsychologist ; 2002,  
Neuropsychologia)

na<-na.rm

#if no summaries are entered, they are computed
if(missing(n)) {
n <- length(controls)
mean.c <- mean(controls, na.rm=na)
sd.c <- sd(controls, na.rm=na)
}
dl <- n-1   #degrees of freedom of the  test

#t.test computation
t.obs <- (patient-mean.c) / (sd.c*(((n+1)/n)^0.5))
proba.onetailed <- 1-pt(abs(t.obs), df=dl)
rar <- pt(t.obs, df=dl) #point estimate of the rarity

#confidence intervals computation on the rarity (Crawford &  
Garthwaite, 2002, Neuropsychologia)
c <- (patient-mean.c)/sd.c
#finding the non central parameter of t distributions
f <- function(delta, pr, x, df) pt(x, df = df, ncp = delta) - pr
deltaL <- uniroot(f, lower=-37.62, upper=37.62, pr = 0.025, x = c* 
(n^0.5), df = dl)
deltaU <- uniroot(f, lower=-37.62, upper=37.62, pr = 0.975, x = c* 
(n^0.5), df = dl)
CI.U <- pnorm(deltaL$root/(n^0.5)) * 100 #upper bound of the  
confidence interval
CI.L <- pnorm(deltaU$root/(n^0.5)) * 100 #lower bound of the  
confidence interval

#output
output <- list(statistic=t.obs, p.value=c 
(one.tailed=proba.onetailed, twotailed=2*proba.onetailed), rarity=c 
(rarity=rar, lower.boud=CI.L, upper.bound=CI.U), df=dl, method=paste 
("Crawford modified t test with", dl, "degrees of freedom", sep=" "))
class(output)<-"htest"
return(output)
}



Matthieu Dubois, PH.D. Student
Cognitive Neuroscience Unit
Université catholique de Louvain
10, Place cardinal Mercier - 1348 Louvain-la-Neuve - BELGIUM

[EMAIL PROTECTED]




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] finding ncp for t distribution

2006-03-02 Thread Matthieu Dubois
Dear R-users,


I am wondering whether R implements a function returning the non central 
parameter of a t distribution (equivalent of the TnoncT function from 
SAS), given /x/ a value from a t distribution, /df /the  degrees of 
freedom and /p/ the probability of x under this  distribution.


Thanks  a lot,


Matthieu


--

Matthieu Dubois, /Ph.D/. /Student/

Cognitive Neuroscience Unit, UCL, Belgium



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html