Hello I have encountered a bug(?) with the parallel package. When run from within a function, the parLapply function appears to be copying the entire parent environment (environment of interior of function) into all child nodes in the cluster, one node at a time - which is very very slow - and the copied contents are not even accessible within the child nodes even though they are apparent in the memory footprint. This happens when parLapply is run from within a function. I may be misusing the terms "parent" and "node" here...

The below code demonstrates the issue. The same parallel command is used twice within the function, once before creating a large object, and once afterwards. Both commands should take a nearly identical amount of time. Initially the parallel code takes less than 1/100th of a second, but in the second iteration requires hundreds of times longer...

Example Code:

     #create a cluster of nodes
     if(!"clus1" %in% ls()) clus1=makeCluster(10)

     #function used to demonstrate bug
     rows_fn1=function(x,clus){

         #first set of parallel code
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))

         #create large vector
         x=rnorm(10^7)

         #second set
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))

     }

#demonstrate bug - watch task manager and see windows slowly copy the vector to each node in the cluster
     rows_fn1(1:5000,clus1)

Although the child nodes bloat proportionally to the size of x in the parent environment, x is not available in the child nodes. The code above can be tweaked to add more variables (x1,x2,x3 ...) and the child nodes will bloat to the same degree.

I am working on Windows Server 2012, I am using 64bit R version 3.2.1. I upgraded to 3.2.4revised and observed the same bug.

I have googled for this issue and have not encountered any other individuals having a similar problem.

I have attempted to reboot my machine without effect (aside from the obvious).

Any suggestions would be greatly appreciated!

With regards,

Jacob L Strunk
Forest Biometrician (PhD), Statistician (MSc)
and Data Munger

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to