> On Jan 10, 2020, at 3:10 PM, Gábor Csárdi <csardi.ga...@gmail.com> wrote:
>
> On Fri, Jan 10, 2020 at 7:23 PM Simon Urbanek
> <simon.urba...@r-project.org> wrote:
>>
>> Henrik,
>>
>> the example from the post works just fine in CRAN R for me - the post was
>> about homebrew build so it's conceivably a bug in their libraries.
>
> I think it works now, because Apple switched to a different SSL
> library for libcurl. It usually crashes or fails on older macOS
> versions, with the CRAN build of R as well.
>
That is not true - Apple has not changed the SSL back-end for many years. The
issue in that post is presumably in the homebrew version of SSL.
Cheers,
Simon
> It is not a bug in any library, it is just that macOS does not support
> fork() without an immediate exec().
>
> In general, any code that calls the macOS system libraries might
> crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
> there is no guarantee for that, either.)
>
> You get crashes in the terminal as well, without multithreading. E.g.
> the keyring package links for the Security library on macOS, so you
> get:
>
> ❯ R --vanilla -q
>> .libPaths("~/R/3.6")
>> keyring::key_list()[1:2,]
> service username
> 1 CommCenter kEntitlementsUniqueIDCacheKey
> 2 ids identity-rsa-public-key
>> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])
>
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
>
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
>
> AFAICT only Apple can do anything about this, and they won't.
>
> Gabor
>
>> That's exactly why I was proposing a more general solution where you can
>> simply define a function in user-space that will issue a warning or stop on
>> fork, it doesn't have to be part of core R, there are other packages that
>> use fork() as well, so what I proposed is much safer than hacking the
>> parallel package.
>>
>> Cheers,
>> Simon
>>
>>
>>
>>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson <henrik.bengts...@gmail.com>
>>> wrote:
>>>
>>> The RStudio GUI was just one example. AFAIK, and please correct me if
>>> I'm wrong, another example is where multi-threaded code is used in
>>> forked processing and that's sometimes unstable. Yes another, which
>>> might be multi-thread related or not, is
>>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
>>>
>>> res <- parallel::mclapply(urls, function(url) {
>>> download.file(url, basename(url))
>>> })
>>>
>>> That was reported to fail on macOS with the default method="libcurl"
>>> but not for method="curl" or method="wget".
>>>
>>> Further documentation is needed and would help but I don't believe
>>> it's sufficient to solve everyday problems. The argument for
>>> introducing an option/env var to disable forking is to give the end
>>> user a quick workaround for newly introduced bugs. Neither the
>>> develop nor the end user have full control of the R package stack,
>>> which is always in flux. For instance, above mclapply() code might
>>> have been in a package on CRAN and then all of a sudden
>>> method="libcurl" became the new default in base R. The above
>>> mclapply() code is now buggy on macOS, and not necessarily caught by
>>> CRAN checks. The package developer might not notice this because they
>>> are on Linux or Windows. It can take a very long time before this
>>> problem is even noticed and even further before it is tracked down and
>>> fixed. Similarly, as more and more code turn to native code and it
>>> becomes easier and easier to implement multi-threading, more and more
>>> of these bugs across package dependencies risk sneaking in the
>>> backdoor wherever forked processing is in place.
>>>
>>> For the end user, but also higher-up upstream package developers, the
>>> quickest workaround would be disable forking. If you're conservative,
>>> you could even disable it all of your R processing. Being able to
>>> quickly disable forking will also provide a mechanism for quickly
>>> testing the hypothesis that forking is the underlying problem, i.e.
>>> "Please retry with options(fork.allowed = FALSE)" will become handy
>>> for troubleshooting.
>>>
>>> /Henrik
>>>
>>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
>>> <simon.urba...@r-project.org> wrote:
>>>>
>>>> If I understand the thread correctly this is an RStudio issue and I would
>>>> suggest that the developers consider using pthread_atfork() so RStudio can
>>>> handle forking as they deem fit (bail out with an error or make RStudio
>>>> work). Note that in principle the functionality requested here can be
>>>> easily implemented in a package so R doesn’t need to be modified.
>>>>
>>>> Cheers,
>>>> Simon
>>>>
>>>> Sent from my iPhone
>>>>
>>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>>>>>> I'd like to pick up this thread started on 2019-04-11
>>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
>>>>>> Modulo all the other suggestions in this thread, would my proposal of
>>>>>> being able to disable forked processing via an option or an
>>>>>> environment variable make sense?
>>>>>
>>>>> I don't think R should be doing that. There are caveats with using fork,
>>>>> and they are mentioned in the documentation of the parallel package, so
>>>>> people can easily avoid functions that use it, and this all has been
>>>>> discussed here recently.
>>>>>
>>>>> If it is the case, we can expand the documentation in parallel package,
>>>>> add a warning against the use of forking with RStudio, but for that I it
>>>>> would be good to know at least why it is not working. From the github
>>>>> issue I have the impression that it is not really known why, whether it
>>>>> could be fixed, and if so, where. The same github issue reflects also
>>>>> that some people want to use forking for performance reasons, and even
>>>>> with RStudio, at least on Linux. Perhaps it could be fixed? Perhaps it is
>>>>> just some race condition somewhere?
>>>>>
>>>>> Tomas
>>>>>
>>>>>> I've prototyped a working patch that
>>>>>> works like:
>>>>>>> options(fork.allowed = FALSE)
>>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
>>>>>> [1] 14058 14058
>>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
>>>>>> [1] 14058 14058
>>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
>>>>>> [1] 14058.1 14058.2
>>>>>>> f <- parallel::mcparallel(Sys.getpid())
>>>>>> Error in allowFork(assert = TRUE) :
>>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>>>> environment variable ‘R_FORK_ALLOWED’
>>>>>>> cl <- parallel::makeForkCluster(1L)
>>>>>> Error in allowFork(assert = TRUE) :
>>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>>>> environment variable ‘R_FORK_ALLOWED’
>>>>>> The patch is:
>>>>>> Index: src/library/parallel/R/unix/forkCluster.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
>>>>>> @@ -30,6 +30,7 @@
>>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank)
>>>>>> {
>>>>>> + allowFork(assert = TRUE)
>>>>>> options <- addClusterOptions(options, list(...))
>>>>>> outfile <- getClusterOption("outfile", options)
>>>>>> port <- getClusterOption("port", options)
>>>>>> Index: src/library/parallel/R/unix/mclapply.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
>>>>>> @@ -28,7 +28,7 @@
>>>>>> stop("'mc.cores' must be >= 1")
>>>>>> .check_ncores(cores)
>>>>>> - if (isChild() && !isTRUE(mc.allow.recursive))
>>>>>> + if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
>>>>>> return(lapply(X = X, FUN = FUN, ...))
>>>>>> ## Follow lapply
>>>>>> Index: src/library/parallel/R/unix/mcparallel.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
>>>>>> @@ -20,6 +20,7 @@
>>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
>>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
>>>>>> {
>>>>>> + allowFork(assert = TRUE)
>>>>>> f <- mcfork(detached)
>>>>>> env <- parent.frame()
>>>>>> if (isTRUE(mc.set.seed)) mc.advance.stream()
>>>>>> Index: src/library/parallel/R/unix/pvec.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy)
>>>>>> @@ -25,7 +25,7 @@
>>>>>> cores <- as.integer(mc.cores)
>>>>>> if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>>>> - if(cores == 1L) return(FUN(v, ...))
>>>>>> + if(cores == 1L || !allowFork()) return(FUN(v, ...))
>>>>>> .check_ncores(cores)
>>>>>> if(mc.set.seed) mc.reset.stream()
>>>>>> with a new file src/library/parallel/R/unix/allowFork.R:
>>>>>> allowFork <- function(assert = FALSE) {
>>>>>> value <- Sys.getenv("R_FORK_ALLOWED")
>>>>>> if (nzchar(value)) {
>>>>>> value <- switch(value,
>>>>>> "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
>>>>>> "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
>>>>>> stop(gettextf("invalid environment variable value: %s==%s",
>>>>>> "R_FORK_ALLOWED", value)))
>>>>>> value <- as.logical(value)
>>>>>> } else {
>>>>>> value <- TRUE
>>>>>> }
>>>>>> value <- getOption("fork.allowed", value)
>>>>>> if (is.na(value)) {
>>>>>> stop(gettextf("invalid option value: %s==%s", "fork.allowed",
>>>>>> value))
>>>>>> }
>>>>>> if (assert && !value) {
>>>>>> stop(gettextf("Forked processing is not allowed per option %s or
>>>>>> environment variable %s", sQuote("fork.allowed"),
>>>>>> sQuote("R_FORK_ALLOWED")))
>>>>>> }
>>>>>> value
>>>>>> }
>>>>>> /Henrik
>>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera
>>>>>>> <tomas.kalib...@gmail.com> wrote:
>>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
>>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera
>>>>>>>> <tomas.kalib...@gmail.com> wrote:
>>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
>>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> I think it's worth saying that mclapply() works as documented
>>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and
>>>>>>>>>> memory
>>>>>>>>>> overcommitment, and that this means that it may work nicely or fail
>>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long
>>>>>>>>>> vector.
>>>>>>>>> R cannot possibly replicate documentation of the underlying operating
>>>>>>>>> systems. It clearly says that fork() is used and readers who may not
>>>>>>>>> know what fork() is need to learn it from external sources.
>>>>>>>>> Copy-on-write is an elementary property of fork().
>>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed
>>>>>>>> in most modern *nixes, particularly for the architectures in which R
>>>>>>>> usually runs. But it is not an elementary property; it is not even
>>>>>>>> possible without an MMU.
>>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying.
>>>>>>> Not relevant today, and certainly not for systems that run R, but indeed
>>>>>>> people interested in OS internals can look elsewhere for more precise
>>>>>>> information.
>>>>>>> Tomas
>>>>>
>>>>> ______________________________________________
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel