Hi Eric -- Eric Rupley <[EMAIL PROTECTED]> writes:
> Apologies at what must be a very basic question, but I have not found > any clear examples on how to design the following.... > > I would like to run iterative analysis over several processors. A toy > example of the analysis is attached; for a resampling function run 1k > times, with two different sets of conditioning variables i,j on some > data vec... > > What is the usual way to attack such a problem using snow? My > understanding up to this point is that one should: > > (1) set the random seed to uncorrelate the processors' actions in > select() > > (2) make a function myfunc(vec,i,j) which returns the item of interest > > (3) set up a wrapper which iterates through i,j, and makes the call to > the cluster > > (4) call the cluster using clusterApply(cl,vec, myfunc).... I think you're on the right track. You say: for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { clusterApply(cl, vec, analysis.func, i, j) The clusterApply says, for each element of vec, invoke analysis.func. vec is of length 1000, so you invoke analysis.func 1000 times, and with the outer loops you're calling analysis func 2 * 2 * 1000 times. In your single processor code you have for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { res <- analysis.func(vec,i,j) which invokes analysis.func 2 * 2 times. A strategy is to convert your 'for' loops into an appropriate *apply function, which I might do as (approximately) > its <- expand.grid(i=c(2, 4), j=c(5, 6)) > mapply(analysis.func, its$i, its$j, + MoreArgs=list(vec=vec)) [1] 120719.09 60403.20 144993.44 72468.66 (maybe you mean i=2:4, j=5:6 ?) and then to use the appropriate cluster* function, e.g., > clusterMap(cl, analysis.func, its$i, its$j, + MoreArgs=list(vec=vec)) Maybe it is now early enough (though not too early?) for that drink? Martin > I must be terribly confused based on the results attached below....any > advice will be appreciated... > > > Many thanks, > Best, > Eric > > -- > Eric Rupley > University of Michigan, Museum of Anthropology > 1109 Geddes Ave, Rm. 4013 > Ann Arbor, MI 48109-1079 > > [EMAIL PROTECTED] > +1.734.276.8572 > > > > # set up > # > # cl <- makeCluster(7) > # 8 slaves are spawned successfully. 0 failed. > #clusterSetupRNG(cl) > #[1] "RNGstream" > > > vec <- runif(1000,1,100) > d <- NULL; c.j <- NULL;c.i <- NULL > > # the toy function > > analysis.func <- function (vec,i,j) { > b <- NULL > for (k in c(1:1000)) { > a <- sample(vec,1000,replace=T) #requires randoms... > b <- append(b, mean(a)) > } > c <- (sum(b)*j)/i > return(c) > } > > > # the "analysis" > > system.time(for (i in c(2,4)) { # a series of nested iterations... > > for (j in c(5:6)) { > > d <- > append( mean( as.numeric( clusterApply(cl,vec,analysis.func,i,j) ) ) , > d) > # this is ugly and contorted; there has to be a better way? > c.j <- append(j, c.j) > c.i <- append(i, c.i) > } > }) > > # user system elapsed > # 9.758 0.291 48.771 > #> > > # but the old way is faster... > > d <- NULL; c.j <- NULL; c.i <- NULL # set up again > > system.time(for (i in c(2,4)) { # a series of nested iterations... > > for (j in c(5:6)) { > > d <-append( mean( as.numeric( analysis.func(vec,i,j) )) ,d) > # keeping it ugly for timing comparision... > c.j <- append(j, c.j) > c.i <- append(i, c.i) > } > }) > > > # user system elapsed > # 0.299 0.002 0.299 > #> # arrgrgrgrgrg!!! > > stopCluster(cl) > #[1] 1 > sessionInfo() > #R version 2.7.1 (2008-06-23) > #i386-apple-darwin8.10.1 > # > #locale: > #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > # > #attached base packages: > #[1] stats graphics grDevices utils datasets methods base > # > #other attached packages: > #[1] rlecuyer_0.1 boot_1.2-33 snow_0.3-3 Rmpi_0.5-5 > # > #loaded via a namespace (and not attached): > #[1] tools_2.7.1 > date() > #[1] "Sat Aug 23 04:25:50 2008" > #> > #Too late for a drink. Pity. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.