Hi,

I have been trying to use the new .parallel argument with the most recent 
version of plyr [1] to speed up some tasks. I can run the example in the NEWS 
file [1], and it seems to be working correctly. However, R will only use a 
single core when I try to apply this same approach with ddply(). 

1. http://cran.r-project.org/web/packages/plyr/NEWS

Watching my CPUs I see that in both cases only a single core is used, and they 
take about the same amount of time. Is there a limitation with how ddply() 
dispatches parallel jobs, or is this task not suitable for parallel 
computing?

Cheers,
Dylan


Here is an example:

library(plyr)
library(doMC)
registerDoMC(cores=2)

# example data
d <- data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500))

# function that wastes some time
f <- function(x) {
m <- vector(length=10000)
for(i in 1:10000) {
        m[i] <- mean(sample(x$y, 100))
        }
mean(m)
}

system.time(ddply(d, .(id), .fun=f, .parallel=FALSE))
#  user  system elapsed 
#  2.740   0.016   2.766 

system.time(ddply(d, .(id), .fun=f, .parallel=TRUE))
#  user  system elapsed 
#  2.720   0.000   2.726 





-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to