On Sat, Jan 11, 2020 at 4:14 AM Simon Urbanek
<simon.urba...@r-project.org> wrote:
[...]
> > In general, any code that calls the macOS system libraries might
> > crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
> > there is no guarantee for that, either.)
> >
>
> That is not true, either. macOS itself is fork-safe (it is POSIX-certified 
> after all), but libraries may or may not.

Right, so CoreFoundation should be fine. Well, as much as it can be.

[...]
>  As I said, using mc* functions explicitly says that you are ok with forking, 
> so you if you run code that is not fork-safe it is clearly a user error.

Just to clarify, this basically means that you cannot use mc* in
portable code. Even base R functions might fail, see e.g. the previous
crashes with libcurl.

> That's exactly why we have the long "Warning" section in the documentation. 
> If you have suggestions for its improvements, please feel free to supply 
> patches.

One suggestion would be to export the parallel::isChild(), so that we
can guard against crashes.

Best,
Gabor

> Cheers,
> Simon
>
>
> > You get crashes in the terminal as well, without multithreading. E.g.
> > the keyring package links for the Security library on macOS, so you
> > get:
> >
> > ❯ R --vanilla -q
> >> .libPaths("~/R/3.6")
> >> keyring::key_list()[1:2,]
> >        service                                                  username
> > 1    CommCenter                             kEntitlementsUniqueIDCacheKey
> > 2           ids                                   identity-rsa-public-key
> >> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])
> >
> > *** caught segfault ***
> > address 0x110, cause 'memory not mapped'
> >
> > *** caught segfault ***
> > address 0x110, cause 'memory not mapped'
> >
> > AFAICT only Apple can do anything about this, and they won't.
> >
> > Gabor
> >
> >> That's exactly why I was proposing a more general solution where you can 
> >> simply define a function in user-space that will issue a warning or stop 
> >> on fork, it doesn't have to be part of core R, there are other packages 
> >> that use fork() as well, so what I proposed is much safer than hacking the 
> >> parallel package.
> >>
> >> Cheers,
> >> Simon
> >>
> >>
> >>
> >>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson 
> >>> <henrik.bengts...@gmail.com> wrote:
> >>>
> >>> The RStudio GUI was just one example.  AFAIK, and please correct me if
> >>> I'm wrong, another example is where multi-threaded code is used in
> >>> forked processing and that's sometimes unstable.  Yes another, which
> >>> might be multi-thread related or not, is
> >>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
> >>>
> >>> res <- parallel::mclapply(urls, function(url) {
> >>> download.file(url, basename(url))
> >>> })
> >>>
> >>> That was reported to fail on macOS with the default method="libcurl"
> >>> but not for method="curl" or method="wget".
> >>>
> >>> Further documentation is needed and would help but I don't believe
> >>> it's sufficient to solve everyday problems.  The argument for
> >>> introducing an option/env var to disable forking is to give the end
> >>> user a quick workaround for newly introduced bugs.  Neither the
> >>> develop nor the end user have full control of the R package stack,
> >>> which is always in flux.  For instance, above mclapply() code might
> >>> have been in a package on CRAN and then all of a sudden
> >>> method="libcurl" became the new default in base R.  The above
> >>> mclapply() code is now buggy on macOS, and not necessarily caught by
> >>> CRAN checks.  The package developer might not notice this because they
> >>> are on Linux or Windows.  It can take a very long time before this
> >>> problem is even noticed and even further before it is tracked down and
> >>> fixed.   Similarly, as more and more code turn to native code and it
> >>> becomes easier and easier to implement multi-threading, more and more
> >>> of these bugs across package dependencies risk sneaking in the
> >>> backdoor wherever forked processing is in place.
> >>>
> >>> For the end user, but also higher-up upstream package developers, the
> >>> quickest workaround would be disable forking.  If you're conservative,
> >>> you could even disable it all of your R processing.  Being able to
> >>> quickly disable forking will also provide a mechanism for quickly
> >>> testing the hypothesis that forking is the underlying problem, i.e.
> >>> "Please retry with options(fork.allowed = FALSE)" will become handy
> >>> for troubleshooting.
> >>>
> >>> /Henrik
> >>>
> >>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
> >>> <simon.urba...@r-project.org> wrote:
> >>>>
> >>>> If I understand the thread correctly this is an RStudio issue and I 
> >>>> would suggest that the developers consider using pthread_atfork() so 
> >>>> RStudio can handle forking as they deem fit (bail out with an error or 
> >>>> make RStudio work).  Note that in principle the functionality requested 
> >>>> here can be easily implemented in a package so R doesn’t need to be 
> >>>> modified.
> >>>>
> >>>> Cheers,
> >>>> Simon
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> 
> >>>>>> wrote:
> >>>>>>
> >>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
> >>>>>> I'd like to pick up this thread started on 2019-04-11
> >>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
> >>>>>> Modulo all the other suggestions in this thread, would my proposal of
> >>>>>> being able to disable forked processing via an option or an
> >>>>>> environment variable make sense?
> >>>>>
> >>>>> I don't think R should be doing that. There are caveats with using 
> >>>>> fork, and they are mentioned in the documentation of the parallel 
> >>>>> package, so people can easily avoid functions that use it, and this all 
> >>>>> has been discussed here recently.
> >>>>>
> >>>>> If it is the case, we can expand the documentation in parallel package, 
> >>>>> add a warning against the use of forking with RStudio, but for that I 
> >>>>> it would be good to know at least why it is not working. From the 
> >>>>> github issue I have the impression that it is not really known why, 
> >>>>> whether it could be fixed, and if so, where. The same github issue 
> >>>>> reflects also that some people want to use forking for performance 
> >>>>> reasons, and even with RStudio, at least on Linux. Perhaps it could be 
> >>>>> fixed? Perhaps it is just some race condition somewhere?
> >>>>>
> >>>>> Tomas
> >>>>>
> >>>>>> I've prototyped a working patch that
> >>>>>> works like:
> >>>>>>> options(fork.allowed = FALSE)
> >>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
> >>>>>> [1] 14058 14058
> >>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
> >>>>>> [1] 14058 14058
> >>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
> >>>>>> [1] 14058.1 14058.2
> >>>>>>> f <- parallel::mcparallel(Sys.getpid())
> >>>>>> Error in allowFork(assert = TRUE) :
> >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
> >>>>>> environment variable ‘R_FORK_ALLOWED’
> >>>>>>> cl <- parallel::makeForkCluster(1L)
> >>>>>> Error in allowFork(assert = TRUE) :
> >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
> >>>>>> environment variable ‘R_FORK_ALLOWED’
> >>>>>> The patch is:
> >>>>>> Index: src/library/parallel/R/unix/forkCluster.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
> >>>>>> @@ -30,6 +30,7 @@
> >>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank)
> >>>>>> {
> >>>>>> +    allowFork(assert = TRUE)
> >>>>>>   options <- addClusterOptions(options, list(...))
> >>>>>>   outfile <- getClusterOption("outfile", options)
> >>>>>>   port <- getClusterOption("port", options)
> >>>>>> Index: src/library/parallel/R/unix/mclapply.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
> >>>>>> @@ -28,7 +28,7 @@
> >>>>>>       stop("'mc.cores' must be >= 1")
> >>>>>>   .check_ncores(cores)
> >>>>>> -    if (isChild() && !isTRUE(mc.allow.recursive))
> >>>>>> +    if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
> >>>>>>       return(lapply(X = X, FUN = FUN, ...))
> >>>>>>   ## Follow lapply
> >>>>>> Index: src/library/parallel/R/unix/mcparallel.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
> >>>>>> @@ -20,6 +20,7 @@
> >>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
> >>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
> >>>>>> {
> >>>>>> +    allowFork(assert = TRUE)
> >>>>>>   f <- mcfork(detached)
> >>>>>>   env <- parent.frame()
> >>>>>>   if (isTRUE(mc.set.seed)) mc.advance.stream()
> >>>>>> Index: src/library/parallel/R/unix/pvec.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy)
> >>>>>> @@ -25,7 +25,7 @@
> >>>>>>   cores <- as.integer(mc.cores)
> >>>>>>   if(cores < 1L) stop("'mc.cores' must be >= 1")
> >>>>>> -    if(cores == 1L) return(FUN(v, ...))
> >>>>>> +    if(cores == 1L || !allowFork()) return(FUN(v, ...))
> >>>>>>   .check_ncores(cores)
> >>>>>>   if(mc.set.seed) mc.reset.stream()
> >>>>>> with a new file src/library/parallel/R/unix/allowFork.R:
> >>>>>> allowFork <- function(assert = FALSE) {
> >>>>>>  value <- Sys.getenv("R_FORK_ALLOWED")
> >>>>>>  if (nzchar(value)) {
> >>>>>>      value <- switch(value,
> >>>>>>         "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
> >>>>>>         "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
> >>>>>>          stop(gettextf("invalid environment variable value: %s==%s",
> >>>>>>         "R_FORK_ALLOWED", value)))
> >>>>>> value <- as.logical(value)
> >>>>>>  } else {
> >>>>>>      value <- TRUE
> >>>>>>  }
> >>>>>>  value <- getOption("fork.allowed", value)
> >>>>>>  if (is.na(value)) {
> >>>>>>      stop(gettextf("invalid option value: %s==%s", "fork.allowed", 
> >>>>>> value))
> >>>>>>  }
> >>>>>>  if (assert && !value) {
> >>>>>>    stop(gettextf("Forked processing is not allowed per option %s or
> >>>>>> environment variable %s", sQuote("fork.allowed"),
> >>>>>> sQuote("R_FORK_ALLOWED")))
> >>>>>>  }
> >>>>>>  value
> >>>>>> }
> >>>>>> /Henrik
> >>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera 
> >>>>>>> <tomas.kalib...@gmail.com> wrote:
> >>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
> >>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera 
> >>>>>>>> <tomas.kalib...@gmail.com> wrote:
> >>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
> >>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> 
> >>>>>>>>>> wrote:
> >>>>>>>>>>> I think it's worth saying that mclapply() works as documented
> >>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and 
> >>>>>>>>>> memory
> >>>>>>>>>> overcommitment, and that this means that it may work nicely or fail
> >>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long
> >>>>>>>>>> vector.
> >>>>>>>>> R cannot possibly replicate documentation of the underlying 
> >>>>>>>>> operating
> >>>>>>>>> systems. It clearly says that fork() is used and readers who may not
> >>>>>>>>> know what fork() is need to learn it from external sources.
> >>>>>>>>> Copy-on-write is an elementary property of fork().
> >>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed
> >>>>>>>> in most modern *nixes, particularly for the architectures in which R
> >>>>>>>> usually runs. But it is not an elementary property; it is not even
> >>>>>>>> possible without an MMU.
> >>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying.
> >>>>>>> Not relevant today, and certainly not for systems that run R, but 
> >>>>>>> indeed
> >>>>>>> people interested in OS internals can look elsewhere for more precise
> >>>>>>> information.
> >>>>>>> Tomas
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel@r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>>> ______________________________________________
> >>>> R-devel@r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>
> >> ______________________________________________
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to