Dear list members,

Any help on this efficiency issue would be greatly appreciated.

I would like to find the most efficient way to run a non-vectorized function 
(here: fisher exact test p-value) iteratively using 4 matrices with identical 
dimensions. And as a result I aim for an array with identical dimensions 
containing the corresponding p-values. Please consider some code using a 
trivial example with 3x4 arrays below. Eventually I would like to run code on 
2e3 x 7e6 arrays, for which someone suggested Amazon EC2 already...

Q1: would you agree that fisher.test() is not vectorizable? e.g. fisher.test( 
matrix(c(Ax,Ay,Bx,By),ncol=2) ) does not work
Q2: direct use of Ax, Ay, Bx, By as input instead of a (list) transform for the 
input would seem beneficial for speed
Q3: parallelization of the iterative process seems to make sense.
Q4: a progress bar seems to save peace of mind having no clue of the runtime.
Q5: avoidance of an output transform to get array from vector  
Q6: for Q2/3/4/5 plyr seems to be ideal (e.g. maply) 

Please also find some solutions below.  
solution 1: using mapply
solution 2: using lapply
solution 3: using mclapply
attempt 4: stuck on plyr implementation

--Philip

### CODE START ###

Ax <- matrix(c(2,3,5,6,
               3,7,8,9,
               8,2,1,3), ncol = 4)
Ay <- matrix(c(9,8,5,7,
               4,9,9,9,
               8,7,5,4), ncol = 4)
Bx <- matrix(c(1,5,9,8,
               4,7,8,9,
               2,3,2,1), ncol = 4)
By <- matrix(c(5,5,9,9,
               9,8,8,9,
               5,5,3,2), ncol = 4)

### solution 1 using mapply
# proper answer, no input transform, output transform, no parallelization, no 
progress update
sol1 <- function() {
 res1 <- mapply(
   function(i,j,k,l) { fisher.test( matrix(c(i,j,k,l), ncol=2), 
conf.int=F)$p.value },
   i=Ax, j=Ay, k=Bx, l=By, 
   SIMPLIFY=TRUE)
 ans1 <- matrix(res1,ncol=4)
 return(ans1)
}
s1 <- sol1()

### solution 2 using lapply
# proper answer, input transform, output transform, no parallelization, no 
progress update
sol2 <- function() {
 tmp.list <- as.data.frame(rbind(as.numeric(Ax), as.numeric(Ay), 
as.numeric(Bx), as.numeric(By)))
 # determine fisher.test p-values as list
 res2 <- lapply(tmp.list, 
    function(x) { fisher.test( matrix(unlist(x), ncol=2), conf.int=F)$p.value 
})  
 ans2 <- matrix(unlist(res2),ncol=4)
 return(ans2)
}
s2 <- sol2()

### solution 3 using mclapply
# proper answer using input transform, output transform, parallelization, no 
progress update
library(multicore)
sol3 <- function() {
  tmp.list <- as.data.frame(rbind(as.numeric(Ax), as.numeric(Ay), 
as.numeric(Bx), as.numeric(By)))
  # determine fisher.test p-values as list
  res3 <- mclapply(tmp.list, 
    function(x) { fisher.test( matrix(unlist(x), ncol=2), conf.int=F)$p.value },
    mc.cores=4)  
  ans3 <- matrix(unlist(res3),ncol=4)
}
s3 <- sol3()

### solution 4 using plyr::maply
# difficulty finding equivalent code
# benefit could be: no input transform, no output transform, parallelization, 
and progress update
 library(plyr)
 library(abind)
 library(doMC)
 registerDoMC(cores=4)
 sol4 <- function() {
  ans4 <- maply(
   #.data = abind(i=Ax,j=Ay,k=Bx,l=By,along=0),
   #.data = abind(Ax,Ay,Bx,By,along=3),
   #.data = data.frame(i=Ax, j=Ay, k=Bx, l=By),
   #.data = cbind(i=as.vector(Ax), j=as.vector(Ay), k=as.vector(Bx), 
l=as.vector(By)),
   #.data = list(i=Ax, j=Ay, k=Bx, l=By),
   .data = abind(i=Ax, j=Ay, k=Bx, l=By, along=3),
   .fun = function(i,j,k,l) { fisher.test( matrix(c(i,j,k,l), ncol=2), 
conf.int=F)$p.value },
   .progress = "text",
   .parralel = TRUE
  )
  return(ans4)
}

all.equal(s1,s2) # TRUE
all.equal(s1,s3) # TRUE

library(microbenchmark)
microbenchmark(sol1, times=1000)
microbenchmark(sol2, times=1000)
microbenchmark(sol3, times=1000)
microbenchmark(sol4, times=1000)


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to