On Mar 18, 2009, at 9:45 , Rune Schjellerup Philosof wrote:

Simon Urbanek wrote:
On Mar 18, 2009, at 8:59 , Rune Schjellerup Philosof wrote:
A simple example of use:
data1 <- data2 <- matrix(0, r, c)
dataFiller <- function(i) {
 tmp <- someCalculation(i)
 data1[, i] <<- tmp$result1
 data2[, i] <<- tmp$result2
}
runParallelInThreads(1:c, dataFiller)

If this can be done almost as fast and simple with processes, for
instance using the multicore package, then I think it needs to be better
documented.

Can you elaborate on the last sentence, please? Things cannot happen
if you don't ask ...


What I meant with the sentence was:
- How would you do the example above using the multicore package?

There are many ways, but for example with automated dispatch to the cores:

l=mclapply(1:c, someCaluculation)
for (i in 1:c) { data1[,i] = l[[i]]$result1; data2[,i] = l[[i]] $result2 }


- What is the magnitude of speed reduction comparing the multicore solution with the hypothetical solution using threads?

That really depends on the duration of the computation performed since the difference is essentially just the setup cost (there is also some cost associated with result transfer*). Since you simply cannot use threads in R there is no realistic way to compare it ;).

The closest I can get to answering the question is to simply measure the overhead. On my machine (8-core Xeon 3.3GHz) I get something like this:
forking cost: 1.3ms
data management cost: 0.6ms
(measured by sequentially spawning and collecting 1000 parallel jobs of the form function(...) NULL) This is a really negligible cost unless you are running thousands of parallel processes which would be entirely pointless since there are very few machines with 1000 cores ;). Note that in practice you are spawning only as many jobs as you have cores (or slightly more), so the overhead is not really an issue at all.

Cheers,
Simon

(*) - this is usually not an issue, but if you had really large result sets it could be of interest to change the implementation a bit. Currently the results are passed back in a pipe, but it would be possible (albeit possibly less portable) to use shared memory instead.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to