> On Jan 10, 2020, at 3:10 PM, Gábor Csárdi <csardi.ga...@gmail.com> wrote:
> 
> On Fri, Jan 10, 2020 at 7:23 PM Simon Urbanek
> <simon.urba...@r-project.org> wrote:
>> 
>> Henrik,
>> 
>> the example from the post works just fine in CRAN R for me - the post was 
>> about homebrew build so it's conceivably a bug in their libraries.
> 
> I think it works now, because Apple switched to a different SSL
> library for libcurl. It usually crashes or fails on older macOS
> versions, with the CRAN build of R as well.
> 

That is not true - Apple has not changed the SSL back-end for many years. The 
issue in that post is presumably in the homebrew version of SSL.

Cheers,
Simon


> It is not a bug in any library, it is just that macOS does not support
> fork() without an immediate exec().
> 
> In general, any code that calls the macOS system libraries might
> crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
> there is no guarantee for that, either.)
> 
> You get crashes in the terminal as well, without multithreading. E.g.
> the keyring package links for the Security library on macOS, so you
> get:
> 
> ❯ R --vanilla -q
>> .libPaths("~/R/3.6")
>> keyring::key_list()[1:2,]
>        service                                                  username
> 1    CommCenter                             kEntitlementsUniqueIDCacheKey
> 2           ids                                   identity-rsa-public-key
>> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])
> 
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
> 
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
> 
> AFAICT only Apple can do anything about this, and they won't.
> 
> Gabor
> 
>> That's exactly why I was proposing a more general solution where you can 
>> simply define a function in user-space that will issue a warning or stop on 
>> fork, it doesn't have to be part of core R, there are other packages that 
>> use fork() as well, so what I proposed is much safer than hacking the 
>> parallel package.
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson <henrik.bengts...@gmail.com> 
>>> wrote:
>>> 
>>> The RStudio GUI was just one example.  AFAIK, and please correct me if
>>> I'm wrong, another example is where multi-threaded code is used in
>>> forked processing and that's sometimes unstable.  Yes another, which
>>> might be multi-thread related or not, is
>>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
>>> 
>>> res <- parallel::mclapply(urls, function(url) {
>>> download.file(url, basename(url))
>>> })
>>> 
>>> That was reported to fail on macOS with the default method="libcurl"
>>> but not for method="curl" or method="wget".
>>> 
>>> Further documentation is needed and would help but I don't believe
>>> it's sufficient to solve everyday problems.  The argument for
>>> introducing an option/env var to disable forking is to give the end
>>> user a quick workaround for newly introduced bugs.  Neither the
>>> develop nor the end user have full control of the R package stack,
>>> which is always in flux.  For instance, above mclapply() code might
>>> have been in a package on CRAN and then all of a sudden
>>> method="libcurl" became the new default in base R.  The above
>>> mclapply() code is now buggy on macOS, and not necessarily caught by
>>> CRAN checks.  The package developer might not notice this because they
>>> are on Linux or Windows.  It can take a very long time before this
>>> problem is even noticed and even further before it is tracked down and
>>> fixed.   Similarly, as more and more code turn to native code and it
>>> becomes easier and easier to implement multi-threading, more and more
>>> of these bugs across package dependencies risk sneaking in the
>>> backdoor wherever forked processing is in place.
>>> 
>>> For the end user, but also higher-up upstream package developers, the
>>> quickest workaround would be disable forking.  If you're conservative,
>>> you could even disable it all of your R processing.  Being able to
>>> quickly disable forking will also provide a mechanism for quickly
>>> testing the hypothesis that forking is the underlying problem, i.e.
>>> "Please retry with options(fork.allowed = FALSE)" will become handy
>>> for troubleshooting.
>>> 
>>> /Henrik
>>> 
>>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
>>> <simon.urba...@r-project.org> wrote:
>>>> 
>>>> If I understand the thread correctly this is an RStudio issue and I would 
>>>> suggest that the developers consider using pthread_atfork() so RStudio can 
>>>> handle forking as they deem fit (bail out with an error or make RStudio 
>>>> work).  Note that in principle the functionality requested here can be 
>>>> easily implemented in a package so R doesn’t need to be modified.
>>>> 
>>>> Cheers,
>>>> Simon
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>>>>>> I'd like to pick up this thread started on 2019-04-11
>>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
>>>>>> Modulo all the other suggestions in this thread, would my proposal of
>>>>>> being able to disable forked processing via an option or an
>>>>>> environment variable make sense?
>>>>> 
>>>>> I don't think R should be doing that. There are caveats with using fork, 
>>>>> and they are mentioned in the documentation of the parallel package, so 
>>>>> people can easily avoid functions that use it, and this all has been 
>>>>> discussed here recently.
>>>>> 
>>>>> If it is the case, we can expand the documentation in parallel package, 
>>>>> add a warning against the use of forking with RStudio, but for that I it 
>>>>> would be good to know at least why it is not working. From the github 
>>>>> issue I have the impression that it is not really known why, whether it 
>>>>> could be fixed, and if so, where. The same github issue reflects also 
>>>>> that some people want to use forking for performance reasons, and even 
>>>>> with RStudio, at least on Linux. Perhaps it could be fixed? Perhaps it is 
>>>>> just some race condition somewhere?
>>>>> 
>>>>> Tomas
>>>>> 
>>>>>> I've prototyped a working patch that
>>>>>> works like:
>>>>>>> options(fork.allowed = FALSE)
>>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
>>>>>> [1] 14058 14058
>>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
>>>>>> [1] 14058 14058
>>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
>>>>>> [1] 14058.1 14058.2
>>>>>>> f <- parallel::mcparallel(Sys.getpid())
>>>>>> Error in allowFork(assert = TRUE) :
>>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>>>> environment variable ‘R_FORK_ALLOWED’
>>>>>>> cl <- parallel::makeForkCluster(1L)
>>>>>> Error in allowFork(assert = TRUE) :
>>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>>>> environment variable ‘R_FORK_ALLOWED’
>>>>>> The patch is:
>>>>>> Index: src/library/parallel/R/unix/forkCluster.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
>>>>>> @@ -30,6 +30,7 @@
>>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank)
>>>>>> {
>>>>>> +    allowFork(assert = TRUE)
>>>>>>   options <- addClusterOptions(options, list(...))
>>>>>>   outfile <- getClusterOption("outfile", options)
>>>>>>   port <- getClusterOption("port", options)
>>>>>> Index: src/library/parallel/R/unix/mclapply.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
>>>>>> @@ -28,7 +28,7 @@
>>>>>>       stop("'mc.cores' must be >= 1")
>>>>>>   .check_ncores(cores)
>>>>>> -    if (isChild() && !isTRUE(mc.allow.recursive))
>>>>>> +    if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
>>>>>>       return(lapply(X = X, FUN = FUN, ...))
>>>>>>   ## Follow lapply
>>>>>> Index: src/library/parallel/R/unix/mcparallel.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
>>>>>> @@ -20,6 +20,7 @@
>>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
>>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
>>>>>> {
>>>>>> +    allowFork(assert = TRUE)
>>>>>>   f <- mcfork(detached)
>>>>>>   env <- parent.frame()
>>>>>>   if (isTRUE(mc.set.seed)) mc.advance.stream()
>>>>>> Index: src/library/parallel/R/unix/pvec.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy)
>>>>>> @@ -25,7 +25,7 @@
>>>>>>   cores <- as.integer(mc.cores)
>>>>>>   if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>>>> -    if(cores == 1L) return(FUN(v, ...))
>>>>>> +    if(cores == 1L || !allowFork()) return(FUN(v, ...))
>>>>>>   .check_ncores(cores)
>>>>>>   if(mc.set.seed) mc.reset.stream()
>>>>>> with a new file src/library/parallel/R/unix/allowFork.R:
>>>>>> allowFork <- function(assert = FALSE) {
>>>>>>  value <- Sys.getenv("R_FORK_ALLOWED")
>>>>>>  if (nzchar(value)) {
>>>>>>      value <- switch(value,
>>>>>>         "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
>>>>>>         "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
>>>>>>          stop(gettextf("invalid environment variable value: %s==%s",
>>>>>>         "R_FORK_ALLOWED", value)))
>>>>>> value <- as.logical(value)
>>>>>>  } else {
>>>>>>      value <- TRUE
>>>>>>  }
>>>>>>  value <- getOption("fork.allowed", value)
>>>>>>  if (is.na(value)) {
>>>>>>      stop(gettextf("invalid option value: %s==%s", "fork.allowed", 
>>>>>> value))
>>>>>>  }
>>>>>>  if (assert && !value) {
>>>>>>    stop(gettextf("Forked processing is not allowed per option %s or
>>>>>> environment variable %s", sQuote("fork.allowed"),
>>>>>> sQuote("R_FORK_ALLOWED")))
>>>>>>  }
>>>>>>  value
>>>>>> }
>>>>>> /Henrik
>>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera 
>>>>>>> <tomas.kalib...@gmail.com> wrote:
>>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
>>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera 
>>>>>>>> <tomas.kalib...@gmail.com> wrote:
>>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
>>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> I think it's worth saying that mclapply() works as documented
>>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and 
>>>>>>>>>> memory
>>>>>>>>>> overcommitment, and that this means that it may work nicely or fail
>>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long
>>>>>>>>>> vector.
>>>>>>>>> R cannot possibly replicate documentation of the underlying operating
>>>>>>>>> systems. It clearly says that fork() is used and readers who may not
>>>>>>>>> know what fork() is need to learn it from external sources.
>>>>>>>>> Copy-on-write is an elementary property of fork().
>>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed
>>>>>>>> in most modern *nixes, particularly for the architectures in which R
>>>>>>>> usually runs. But it is not an elementary property; it is not even
>>>>>>>> possible without an MMU.
>>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying.
>>>>>>> Not relevant today, and certainly not for systems that run R, but indeed
>>>>>>> people interested in OS internals can look elsewhere for more precise
>>>>>>> information.
>>>>>>> Tomas
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to