On Sat, Jan 11, 2020 at 4:14 AM Simon Urbanek <simon.urba...@r-project.org> wrote: [...] > > In general, any code that calls the macOS system libraries might > > crash. (Except for CoreFoundation, which seems to be fine, but AFAIR > > there is no guarantee for that, either.) > > > > That is not true, either. macOS itself is fork-safe (it is POSIX-certified > after all), but libraries may or may not.
Right, so CoreFoundation should be fine. Well, as much as it can be. [...] > As I said, using mc* functions explicitly says that you are ok with forking, > so you if you run code that is not fork-safe it is clearly a user error. Just to clarify, this basically means that you cannot use mc* in portable code. Even base R functions might fail, see e.g. the previous crashes with libcurl. > That's exactly why we have the long "Warning" section in the documentation. > If you have suggestions for its improvements, please feel free to supply > patches. One suggestion would be to export the parallel::isChild(), so that we can guard against crashes. Best, Gabor > Cheers, > Simon > > > > You get crashes in the terminal as well, without multithreading. E.g. > > the keyring package links for the Security library on macOS, so you > > get: > > > > ❯ R --vanilla -q > >> .libPaths("~/R/3.6") > >> keyring::key_list()[1:2,] > > service username > > 1 CommCenter kEntitlementsUniqueIDCacheKey > > 2 ids identity-rsa-public-key > >> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,]) > > > > *** caught segfault *** > > address 0x110, cause 'memory not mapped' > > > > *** caught segfault *** > > address 0x110, cause 'memory not mapped' > > > > AFAICT only Apple can do anything about this, and they won't. > > > > Gabor > > > >> That's exactly why I was proposing a more general solution where you can > >> simply define a function in user-space that will issue a warning or stop > >> on fork, it doesn't have to be part of core R, there are other packages > >> that use fork() as well, so what I proposed is much safer than hacking the > >> parallel package. > >> > >> Cheers, > >> Simon > >> > >> > >> > >>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson > >>> <henrik.bengts...@gmail.com> wrote: > >>> > >>> The RStudio GUI was just one example. AFAIK, and please correct me if > >>> I'm wrong, another example is where multi-threaded code is used in > >>> forked processing and that's sometimes unstable. Yes another, which > >>> might be multi-thread related or not, is > >>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html: > >>> > >>> res <- parallel::mclapply(urls, function(url) { > >>> download.file(url, basename(url)) > >>> }) > >>> > >>> That was reported to fail on macOS with the default method="libcurl" > >>> but not for method="curl" or method="wget". > >>> > >>> Further documentation is needed and would help but I don't believe > >>> it's sufficient to solve everyday problems. The argument for > >>> introducing an option/env var to disable forking is to give the end > >>> user a quick workaround for newly introduced bugs. Neither the > >>> develop nor the end user have full control of the R package stack, > >>> which is always in flux. For instance, above mclapply() code might > >>> have been in a package on CRAN and then all of a sudden > >>> method="libcurl" became the new default in base R. The above > >>> mclapply() code is now buggy on macOS, and not necessarily caught by > >>> CRAN checks. The package developer might not notice this because they > >>> are on Linux or Windows. It can take a very long time before this > >>> problem is even noticed and even further before it is tracked down and > >>> fixed. Similarly, as more and more code turn to native code and it > >>> becomes easier and easier to implement multi-threading, more and more > >>> of these bugs across package dependencies risk sneaking in the > >>> backdoor wherever forked processing is in place. > >>> > >>> For the end user, but also higher-up upstream package developers, the > >>> quickest workaround would be disable forking. If you're conservative, > >>> you could even disable it all of your R processing. Being able to > >>> quickly disable forking will also provide a mechanism for quickly > >>> testing the hypothesis that forking is the underlying problem, i.e. > >>> "Please retry with options(fork.allowed = FALSE)" will become handy > >>> for troubleshooting. > >>> > >>> /Henrik > >>> > >>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek > >>> <simon.urba...@r-project.org> wrote: > >>>> > >>>> If I understand the thread correctly this is an RStudio issue and I > >>>> would suggest that the developers consider using pthread_atfork() so > >>>> RStudio can handle forking as they deem fit (bail out with an error or > >>>> make RStudio work). Note that in principle the functionality requested > >>>> here can be easily implemented in a package so R doesn’t need to be > >>>> modified. > >>>> > >>>> Cheers, > >>>> Simon > >>>> > >>>> Sent from my iPhone > >>>> > >>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote: > >>>>>> I'd like to pick up this thread started on 2019-04-11 > >>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html). > >>>>>> Modulo all the other suggestions in this thread, would my proposal of > >>>>>> being able to disable forked processing via an option or an > >>>>>> environment variable make sense? > >>>>> > >>>>> I don't think R should be doing that. There are caveats with using > >>>>> fork, and they are mentioned in the documentation of the parallel > >>>>> package, so people can easily avoid functions that use it, and this all > >>>>> has been discussed here recently. > >>>>> > >>>>> If it is the case, we can expand the documentation in parallel package, > >>>>> add a warning against the use of forking with RStudio, but for that I > >>>>> it would be good to know at least why it is not working. From the > >>>>> github issue I have the impression that it is not really known why, > >>>>> whether it could be fixed, and if so, where. The same github issue > >>>>> reflects also that some people want to use forking for performance > >>>>> reasons, and even with RStudio, at least on Linux. Perhaps it could be > >>>>> fixed? Perhaps it is just some race condition somewhere? > >>>>> > >>>>> Tomas > >>>>> > >>>>>> I've prototyped a working patch that > >>>>>> works like: > >>>>>>> options(fork.allowed = FALSE) > >>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid())) > >>>>>> [1] 14058 14058 > >>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid()) > >>>>>> [1] 14058 14058 > >>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10) > >>>>>> [1] 14058.1 14058.2 > >>>>>>> f <- parallel::mcparallel(Sys.getpid()) > >>>>>> Error in allowFork(assert = TRUE) : > >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or > >>>>>> environment variable ‘R_FORK_ALLOWED’ > >>>>>>> cl <- parallel::makeForkCluster(1L) > >>>>>> Error in allowFork(assert = TRUE) : > >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or > >>>>>> environment variable ‘R_FORK_ALLOWED’ > >>>>>> The patch is: > >>>>>> Index: src/library/parallel/R/unix/forkCluster.R > >>>>>> =================================================================== > >>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648) > >>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy) > >>>>>> @@ -30,6 +30,7 @@ > >>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank) > >>>>>> { > >>>>>> + allowFork(assert = TRUE) > >>>>>> options <- addClusterOptions(options, list(...)) > >>>>>> outfile <- getClusterOption("outfile", options) > >>>>>> port <- getClusterOption("port", options) > >>>>>> Index: src/library/parallel/R/unix/mclapply.R > >>>>>> =================================================================== > >>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648) > >>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy) > >>>>>> @@ -28,7 +28,7 @@ > >>>>>> stop("'mc.cores' must be >= 1") > >>>>>> .check_ncores(cores) > >>>>>> - if (isChild() && !isTRUE(mc.allow.recursive)) > >>>>>> + if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive))) > >>>>>> return(lapply(X = X, FUN = FUN, ...)) > >>>>>> ## Follow lapply > >>>>>> Index: src/library/parallel/R/unix/mcparallel.R > >>>>>> =================================================================== > >>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648) > >>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy) > >>>>>> @@ -20,6 +20,7 @@ > >>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent = > >>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE) > >>>>>> { > >>>>>> + allowFork(assert = TRUE) > >>>>>> f <- mcfork(detached) > >>>>>> env <- parent.frame() > >>>>>> if (isTRUE(mc.set.seed)) mc.advance.stream() > >>>>>> Index: src/library/parallel/R/unix/pvec.R > >>>>>> =================================================================== > >>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648) > >>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy) > >>>>>> @@ -25,7 +25,7 @@ > >>>>>> cores <- as.integer(mc.cores) > >>>>>> if(cores < 1L) stop("'mc.cores' must be >= 1") > >>>>>> - if(cores == 1L) return(FUN(v, ...)) > >>>>>> + if(cores == 1L || !allowFork()) return(FUN(v, ...)) > >>>>>> .check_ncores(cores) > >>>>>> if(mc.set.seed) mc.reset.stream() > >>>>>> with a new file src/library/parallel/R/unix/allowFork.R: > >>>>>> allowFork <- function(assert = FALSE) { > >>>>>> value <- Sys.getenv("R_FORK_ALLOWED") > >>>>>> if (nzchar(value)) { > >>>>>> value <- switch(value, > >>>>>> "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE, > >>>>>> "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE, > >>>>>> stop(gettextf("invalid environment variable value: %s==%s", > >>>>>> "R_FORK_ALLOWED", value))) > >>>>>> value <- as.logical(value) > >>>>>> } else { > >>>>>> value <- TRUE > >>>>>> } > >>>>>> value <- getOption("fork.allowed", value) > >>>>>> if (is.na(value)) { > >>>>>> stop(gettextf("invalid option value: %s==%s", "fork.allowed", > >>>>>> value)) > >>>>>> } > >>>>>> if (assert && !value) { > >>>>>> stop(gettextf("Forked processing is not allowed per option %s or > >>>>>> environment variable %s", sQuote("fork.allowed"), > >>>>>> sQuote("R_FORK_ALLOWED"))) > >>>>>> } > >>>>>> value > >>>>>> } > >>>>>> /Henrik > >>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera > >>>>>>> <tomas.kalib...@gmail.com> wrote: > >>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote: > >>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera > >>>>>>>> <tomas.kalib...@gmail.com> wrote: > >>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote: > >>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>> I think it's worth saying that mclapply() works as documented > >>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and > >>>>>>>>>> memory > >>>>>>>>>> overcommitment, and that this means that it may work nicely or fail > >>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long > >>>>>>>>>> vector. > >>>>>>>>> R cannot possibly replicate documentation of the underlying > >>>>>>>>> operating > >>>>>>>>> systems. It clearly says that fork() is used and readers who may not > >>>>>>>>> know what fork() is need to learn it from external sources. > >>>>>>>>> Copy-on-write is an elementary property of fork(). > >>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed > >>>>>>>> in most modern *nixes, particularly for the architectures in which R > >>>>>>>> usually runs. But it is not an elementary property; it is not even > >>>>>>>> possible without an MMU. > >>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying. > >>>>>>> Not relevant today, and certainly not for systems that run R, but > >>>>>>> indeed > >>>>>>> people interested in OS internals can look elsewhere for more precise > >>>>>>> information. > >>>>>>> Tomas > >>>>> > >>>>> ______________________________________________ > >>>>> R-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>> > >>>> ______________________________________________ > >>>> R-devel@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel