Platform: Windows 7
Package: parallel
Function: parLapply

I am running a lengthy program with 8 parallel processes running in main
memory.
The processes save data using the 'save' function, to distinct files so
that no conflicts writing to the same file are possible.
I have been getting errors like the one shown below on a random basis,
i.e., sometimes at one point in the execution, sometimes at another,
sometimes no error at all.
I should note that the directory referred to in the error message
( 'D:\_pgf\quantile_analysis2_f13\_save\dbz084_nump48\bins') contains, as I
write, 124 files saved to it by the program without any error; which
underscores the point that most of the time the saves occur with no problem.

Error in checkForRemoteErrors(val) :
  one node produced an error: (converted from warning)
'D:\_pgf\quantile_analysis2_f13\_save\dbz084_nump48\bins' already exists

Enter a frame number, or 0 to exit

 1: main_top(9)
 2: main_top.r#26: eval(call_me)
 3: eval(expr, envir, enclos)
 4: quantile_analysis(2)
 5: quantile_analysis.r#69: run_all(layr, prjp, np, rules_tb, pctiles_tb,
parx, logdir, logg)
 6: run_all.r#73: parLapply(cl, ctrl_all$vn, qa1, prjp, dfr1, "iu__bool",
parx, logdir, tstamp)
 7: do.call(c, clusterApply(cl, x = splitList(X, length(cl)), fun = lapply,
fun, ...), quote = TRUE)
 8: clusterApply(cl, x = splitList(X, length(cl)), fun = lapply, fun, ...)
 9: staticClusterApply(cl, fun, length(x), argfun)
10: checkForRemoteErrors(val)
11: stop("one node produced an error: ", firstmsg, domain = NA)
12: (function ()
{
    error()
    utils::recover()
})()

Following the latest error I checked the system's connections as follows:

Browse[1]> showConnections()
   description             class      mode  text     isopen   can read can
write
3  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
4  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
5  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
6  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
7  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
8  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
9  "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
10 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
 "yes"
Browse[1]>

It seems that the parallel processes might be sharing the same
connection--or is it that they are utilizing connections that have the same
name but are actually distinct because they are running in parallel?
If the connections are the problem, how can I force each parallel process
to use a different connection?
If the connections are not the problem, then can someone suggest a
diagnostic I might apply to tease out what is going wrong? Or perhaps some
program setting that I may have neglected to consider?

Thanks in advance for your help.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to