Very informative! Thank you.

Quoting Martin Morgan <martin.mor...@roswellpark.org>:

On 03/22/2016 01:46 PM, ja...@forestlidar.org wrote:

Hello I have encountered a bug(?) with the parallel package. When run
from within a function, the parLapply function appears to be copying the
entire parent environment (environment of interior of function) into all
child nodes in the cluster, one node at a time - which is very very slow
- and the copied contents are not even accessible within the child nodes
even though they are apparent in the memory footprint. This happens when
parLapply is run from within a function. I may be misusing the terms
"parent" and "node" here...

The below code demonstrates the issue. The same parallel command is used
twice within the function, once before creating a large object, and once
afterwards. Both commands should take a nearly identical amount of time.
Initially the parallel code takes less than 1/100th of a second, but in
the second iteration requires hundreds of times longer...

Example Code:

     #create a cluster of nodes
     if(!"clus1" %in% ls()) clus1=makeCluster(10)

     #function used to demonstrate bug
     rows_fn1=function(x,clus){

         #first set of parallel code

print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))


         #create large vector
         x=rnorm(10^7)

         #second set

print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))


     }

     #demonstrate bug - watch task manager and see windows slowly copy
the vector to each node in the cluster
     rows_fn1(1:5000,clus1)

Although the child nodes bloat proportionally to the size of x in the
parent environment, x is not available in the child nodes. The code

With this

    library(parallel)
    cl <- makeCluster(2)
    f <- function() {
        x <- 10
        parSapply(cl, 1:5, function(i) x * i)
    }

we see both that x is available, and why (so that symbols available in the environment in which FUN is defined are available, just like serial evaluation) the variable is copied

f()
[1] 10 20 30 40 50

Defining the function in the global environment, rather than in the body of a function, avoids copying implicit state,

    cl <- makeCluster(2)
    FUN <- function(i) x * i
    f <- function() {
        x <- 10
        parSapply(cl, 1:5, FUN)
    }

but requires that all arguments are defined / passed

f()
Error in checkForRemoteErrors(val) (from #3) :
  2 nodes produced errors; first error: object 'x' not found

updating the function definition and use

    FUN <- function(i, x) x * i
    f <- function() {
        x <- 10
        parSapply(cl, 1:5, FUN, x)
    }

f()
[1] 10 20 30 40 50

The foreach package tries to be smart and export only symbols used (but can be tricked)

    library(foreach)
    library(doSNOW)
    registerDoSNOW(cl)
    g <- function() {
        x <- 10
        foreach(i=1:2) %dopar% { get("x") }
    }

g()  # fails because 'x' is not referenced directly so not exported
Error in { (from #3) : task 1 failed - "object 'x' not found"

versus

    g <- function() {
        x <- 10
        foreach(i=1:2) %dopar% { get("x"); x }
    }

and

g()  # works because 'x' referenced and exported
[[1]]
[1] 10

[[2]]
[1] 10


Martin

above can be tweaked to add more variables (x1,x2,x3 ...) and the child
nodes will bloat to the same degree.

I am working on Windows Server 2012, I am using 64bit R version 3.2.1. I
upgraded to 3.2.4revised and observed the same bug.

I have googled for this issue and have not encountered any other
individuals having a similar problem.

I have attempted to reboot my machine without effect (aside from the
obvious).

Any suggestions would be greatly appreciated!

With regards,

Jacob L Strunk
Forest Biometrician (PhD), Statistician (MSc)
and Data Munger

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to