Hi Matthew,

First - I fully support Greg Snow proposition.  Sampling is the way to go
here.

But besides that:
1) Try to avoid using data.frames as much as possible (use vectors and
matrixes instead - they are usually faster)
2) Since you are running on a loop, you can try running it in parallel (if
you have more then one core on your computer). I recently wrote how to do
this on Windows (here:
http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/)
, and there are other ways for doing it on other OS.

But again - sampling is probably going to be the only real solution...

Good luck,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Tue, Apr 27, 2010 at 1:09 AM, Matthew Finkbeiner <
matthew.finkbei...@maccs.mq.edu.au> wrote:

> Yes, I suspect that I will end up using a sampling approach, but I'd like
> to use an exact test if it's at all feasible.
>
> Here are two samples of data from 3 subjects:
> Sample  Subj    C1      C2
> 44      1       0.0093  0.0077
> 44      2       0.0089  0.0069
> 44      3       0.051   0.0432
> 44      4       0.014   0.0147
> 44      5       0.0161  0.0117
> 45      1       0.0103  0.0086
> 45      2       0.0099  0.0078
> 45      3       0.0542  0.0458
> 45      4       0.0154  0.0163
> 45      5       0.0175  0.0129
>
>
> and then here is the script I've pieced together from things I've found on
> the web (sorry for not citing the snippets!).  any pointers on how to speed
> it up would be greatly appreciated.
>
> #----------------------------------
> # Utility function
> # that returns binary representation of 1:(2^n) X SubjN
> binary.v <-
> function(n)
> {
>  x <- 1:(2^n)
>  mx <- max(x)
>  digits <- floor(log2(mx))
>  ans <- 0:(digits-1); lx <- length(x)
>  x <- matrix(rep(x,rep(digits, lx)),ncol=lx)
>  x <- (x %/% 2^ans) %% 2
> }
>
> library(plyr)
>
>
> #first some global variables
> TotalSubjects <- 5
> TotalSamples <- 2
> StartSample <- 44
> EndSample <- ((StartSample + TotalSamples)-1)
> maxTs <- NULL
> obsTs <- NULL
>
>
>
>
> #create index array that drives the permuations for all samples
> ind <- binary.v(TotalSubjects)
>
> #transpose ind so that the first 2^N items correspond to S1,
> #the second 2^N correspond to S2 and so on...
> transind <- t(ind)
>
> #get data file that is organized first by sample then by subj (e.g. sample1
> subject1
> # sample1 subject 2 ... sample 1 subject N)
> #sampledatafile <- file.choose()
>
> samples <- read.table(sampledatafile, header=T)
>
> #this is the progress bar
> pb <- txtProgressBar(min = StartSample, max = EndSample, style = 3)
> setTxtProgressBar(pb, 1)
>
> start.t <- proc.time()
>
> #begin loop that analyzes data sample by sample
> for (s in StartSample:EndSample) {
>
>    S <- samples[samples$Sample==s,] #pick up data for current sample
>
>    #reproduce data frame rows once for each permutation to be done
>    expanddata <- S[rep(1:nrow(S), each = 2^TotalSubjects),]
>
>
>    #create new array to hold the flipped (permuted) data
>    permdata = expanddata
>
>    #permute the data
>    permdata[transind==1,3] <- expanddata[transind==1,4] #Cnd1 <- Cnd2
>    permdata[transind==1,4] <- expanddata[transind==1,3] #Cnd2 <- Cnd1
>
>    #create permutation # as a factor in dataframe
>    PermN <- rep(rep(1:2^TotalSubjects, TotalSubjects),2)
>
>    #create Sample# as a factor
>    Sample <- rep(permdata[,1],2) #Sample# is in the 1st Column
>
>    #create subject IDs as a factor
>    Subj <- rep(permdata[,2],2) #Subject ID is in the 2nd Column
>
>    #stack the permutated data
>    StackedPermData <- stack(permdata[,3:4])
>
>    #bind all the factors together
>    StackedPermData <- as.data.frame(cbind(Sample, Subj, PermN,
> StackedPermData))
>
>
>    #sort by perm
>    sortedstack <-
> as.data.frame(StackedPermData[order(StackedPermData$PermN,
>                                StackedPermData$Sample),])
>
>
>    #clear up some memory
>    rm(expanddata, permdata, StackedPermData)
>
>    #pull out data 1 perm at a time
>    res<-ddply(sortedstack, c("Sample", "PermN"), function(.data){
>
>        # Type combinations by Class
>        combs<-t(combn(sort(unique(.data[,5])),2))
>
>        # Applying the t-test for them
>        aaply(combs,1, function(.r){
>        x1<-.data[.data[,5]==.r[1],4] # select first column
>        x2<-.data[.data[,5]==.r[2],4] # select first column
>
>        tvalue <- t.test(x1,x2, paired = T)
>
>        res <- c(tvalue$statistic,tvalue$parameter,tvalue$p.value)
>        names(res) <- c('stat','df','pvalue')
>        res
>        }
>        )
>    }
>    )
>
>     # update progress bar
>     setTxtProgressBar(pb, s)
>
>     #get max T vals
>     maxTs <- c(maxTs, tapply (res$stat, list (res$Sample), max))
>
>     #get observed T vals
>     obsTs <- c(obsTs, res$stat[length(res$stat)])
>
>     #here we need to save res to a binary file
>
>
> }
> #close out the progress bar
> close(pb)
>
> end.t <- proc.time() - start.t
> print(end.t)
>
> #get cutoffs
> #these are the 2-tailed t-vals that maintain experimentwise error at the
> 0.05 level
> lowerT <- quantile(maxTs, .025)
> upperT <- quantile(maxTs, .975)
>
>
>
>
>
>
>
>
>
>
>
>
> On 4/27/2010 6:53 AM, Greg Snow wrote:
>
>> The usual way to speed up permutation testing is to sample from the
>> set of possible permutations rather than looking at all possible
>> ones.
>>
>> If you show some code then we may be able to find some inefficiencies
>> for you, but there is not general solution, poorly written uses of
>> apply will be slower than well written for loops.  In some cases
>> rewriting critical pieces in C or fortran will help quite a bit, but
>> we need to see what you are already doing to know if that will help
>> or not.
>>
>>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to