Apologies at what must be a very basic question, but I have not found
any clear examples on how to design the following....
I would like to run iterative analysis over several processors. A toy
example of the analysis is attached; for a resampling function run 1k
times, with two different sets of conditioning variables i,j on some
data vec...
What is the usual way to attack such a problem using snow? My
understanding up to this point is that one should:
(1) set the random seed to uncorrelate the processors' actions in
(2) make a function myfunc(vec,i,j) which returns the item of interest
(3) set up a wrapper which iterates through i,j, and makes the call to
the cluster
(4) call the cluster using clusterApply(cl,vec, myfunc)....
I must be terribly confused based on the results attached below....any
advice will be appreciated...
Many thanks,
Eric Rupley
University of Michigan, Museum of Anthropology
1109 Geddes Ave, Rm. 4013
Ann Arbor, MI 48109-1079
# set up
# cl <- makeCluster(7)
# 8 slaves are spawned successfully. 0 failed.
#[1] "RNGstream"
vec <- runif(1000,1,100)
d <- NULL; c.j <- NULL;c.i <- NULL
# the toy function
analysis.func <- function (vec,i,j) {
b <- NULL
for (k in c(1:1000)) {
a <- sample(vec,1000,replace=T) #requires randoms...
b <- append(b, mean(a))
c <- (sum(b)*j)/i
# the "analysis"
system.time(for (i in c(2,4)) { # a series of nested iterations...
for (j in c(5:6)) {
d <-
append( mean( as.numeric( clusterApply(cl,vec,analysis.func,i,j) ) ) ,
# this is ugly and contorted; there has to be a better way?
c.j <- append(j, c.j)
c.i <- append(i, c.i)
# user system elapsed
# 9.758 0.291 48.771
# but the old way is faster...
d <- NULL; c.j <- NULL; c.i <- NULL # set up again
system.time(for (i in c(2,4)) { # a series of nested iterations...
for (j in c(5:6)) {
d <-append( mean( as.numeric( analysis.func(vec,i,j) )) ,d)
# keeping it ugly for timing comparision...
c.j <- append(j, c.j)
c.i <- append(i, c.i)
# user system elapsed
# 0.299 0.002 0.299
#> # arrgrgrgrgrg!!!
#[1] 1
#R version 2.7.1 (2008-06-23)
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#other attached packages:
#[1] rlecuyer_0.1 boot_1.2-33 snow_0.3-3 Rmpi_0.5-5
#loaded via a namespace (and not attached):
#[1] tools_2.7.1
#[1] "Sat Aug 23 04:25:50 2008"
#Too late for a drink. Pity.
