Tomas Kalibera on R-core says that in R-devel

I've increased the number of DLLs... Now it is 614 on systems where
the soft limit on open files allows, but R now attempts to increase
the limit when needed. If this is not possible, the maximum will be
smaller. R will fail to start if the maximum could not be at least
100 (so users who rely on previous behavior where the default was
also 100 are fine).

One can still use the environment variable R_MAX_NUM_DLLS to require a specific maximum. R will try to increase the limit on open files if needed. But if not possible, R will fail to start with an error (which is the same behavior as before the change).

I tested on Linux, macOS, Solaris and Windows. On the macOS and > Solaris 
systems I use, the default soft limit is 256, but R will
> increase it to 1024 and so could load up to 614 DLLs.

It would be great if people gave this a whirl; note that there are not currently Bioc binary builds to officially support R-devel yet.

Martin

On 10/06/2017 04:49 PM, Henrik Bengtsson wrote:
I haven't tried (= had to do it) myself, so I don't know exactly what
it takes, but you can configure this "ulimit" of number of open files, e.g. instructions in https://stackoverflow.com/a/34645/1072091. I suspect it requires admin rights, but I'm not sure - maybe this is what goes on when you run it in different types terminals.

About this open file/DLL limit: in src/main/Rdynload.c (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/main/Rdynload.c#L173-L180)



there's the following comment/clarification:

/* Note that it is likely that dlopen will use up at least one file descriptor for each DLL loaded (it may load further dynamically linked libraries), so we do not want to get close to the fd limit (which may be as low as 256). By default, the maximum number of DLLs that can be loaded is 100. When the fd limit is known, we allow increasing the maximum number of DLLs via environment variable up to
 60% of the limit on open files, but to no more than 1000. g */

I always thought that "as low as 256" was for some archaic system, but, as Wolfgang points out, it's a relevant limit. Since 0.6*256 = 153, this explains that the choice of the current default of a maximum 100 DLLs is reasonable and requests to bump it up much
higher may not be feasible (not cross-platform).


Related to this - "Garbage collection of DLLs":

I've implemented R.utils::gcDLLs() that "Identifies and removes ["stray"] DLLs of packages already unloaded". This function will free up DLL slots otherwise occupied by unloaded packages. I've
used is successfully in many places, e.g. trying to load and unload
all my installed packages in a single R session (don't ask why ;)).

However, as argued by Karl Millar (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html), there is a risk that unregistering such DLLs may render the state of R unstable because we cannot know for sure whether there are some registered finalizers that rely on such DLLs that yet haven't been called. R.utils::gcDLLs() forces the garbage collector to run prior to unregistering DLLs, which should eliminate the risk for this problem. As far as I understand the current R implementation, this should be enough. On the other hand, I've been wrong before, I don't
know about future version of R, and it has only been tested so much.
Guaranteeing reentrancy of finalizers is really tricky.

/Henrik

On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber <wolfgang.hu...@embl.de> wrote:
Interesting! In iTerm2, I get $ ulimit -Sn 4864

and env R_MAX_NUM_DLLS=1000 R

works, which means that on Mac it IS possible to have many more DLLs open than 100 if R is started in the right way.

Wolfgang

PS I meant OS X 10.12.6, too. SOrry for the typo.


6.10.17 14:50, Kasper Daniel Hansen scripsit:

On OS X 10.12.6 (I don't think 10.12.16 exists), I get

$ ulimit -Sn 7168

Interestingly, this is because I use iTerm2 for my command line prompt. If I do the same command in Terminal I get 256. If I start R inside of Emacs I get 256 as well. I don't know
anything about ulimit and how it is set, but that is a pretty
start difference.

Best, Kasper



On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de>> wrote:

On Mac OSX 10.12.16: $ ulimit -Sn 256

so the maximum value of R_MAX_NUM_DLLS is 153 ...

Wolfgang

5.10.17 23:02, Henrik Bengtsson scripsit:

About the DLL limit:

Just wanna make sure you're aware of "new" environment variable R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push
 the current default limit of 100 open DLLs a bit higher.  It
can be set in .Renviron or before, e.g.

$ R_MAX_NUM_DLLS=500 R

This, of course, assumes that you can set it, which you might not
be able to do on build servers.  Also, there is an upper limit
min(0.6*fd_limit,1000) that depends on the number of files you
can have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 I've got:

$ ulimit -Sn 1024

so R_MAX_NUM_DLLS=614 is the maximum for me.

/Henrik

On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber <wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de>> wrote:


Breaking up long workflows into several smaller "modules" each with a clearly defined input and output is a good idea, certainly
for didactic & maintenance reasons.

It doesn't "solve" the DLL issue though, it only avoids it (for now)...

I believe you can use a Makefile for your vignettes

(https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes




<https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes>),
and this might be a good way of managing which depends on which. For passing along output/input, perhaps local .RData files are good enough, perhaps some wheel-reinventing can also be avoided by using

https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html




<https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html>
(haven't actually used it yet, though).

Wolfgang



5.10.17 20:02, Aaron Lun scripsit:


This may relate to what I was thinking with respect to solving the DLL problem, by breaking up large workflows into modules
that can be executed in separate R sessions. The same approach
would also make it easier to associate package dependencies with specific parts of the workflow.


In my particular situation, it is easy to break up the workflow into sections that can be executed completely independently. However, I can also imagine situations where dependencies on previous objects, etc. make it difficult to break up the workflow. If multiple files are present in vignettes/, can they be directed to execute in a specific order, and would output files from one vignette persist during the execution of another?


-Aaron


------------------------------------------------------------------------



*From:* Wolfgang Huber <wolfgang.hu...@embl.de
<mailto:wolfgang.hu...@embl.de>> *Sent:* Thursday, 5 October
2017 6:23:47 PM *To:* Laurent Gatto; Aaron Lun *Cc:* bioc-devel@r-project.org <mailto:bioc-devel@r-project.org>

*Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I agree it is nice to be able to only load the packages needed for a certain section of a vignette and not the whole thing. And that too many `::` can make code look unwieldy (though some may actually increase readability).

But relying on manually sprinkled in `library` calls seems like
a hack prone to error. And there are always bound to be dependencies that are non-local, e.g. on general infrastructure like SummarizedExperiment, ggplot2, dplyr.

So: do we need a way to computationally determine the dependencies of a vignette section, including highlighting/eliminating potential name clashes (b/c the
warnings about masking emitted at package loading are easily
ignored)? This seems like a straightforward engineering task.

Eventually with such code analysis we could get rid of explicit `library` calls altogether :)

Wolfgang





5.10.17 08:53, Laurent Gatto scripsit:



On  5 October 2017 00:11, Aaron Lun wrote:

Here's another two cents from me:

The explicit library() calls allow for easy copy-pasting if people only want to use/adapt a section of the workflow. In such cases, calling "library(simpleSingleCell)" could drag in a lot of unnecessary packages (e.g., which could hit the DLL limit). Reading through the text to figure out the requirements for each
 code chunk seems like a pain, and lots of "::" are unwieldy.

More generally, the removal of individual library() calls seems to encourage the use of a single "library(simpleSingleCell)"
call at the top of any user-developed custom analysis scripts
based on the workflow. This seems conceptually odd to me - the simpleSingleCell package is simply a vehicle for the compiled workflow, it shouldn't be involved in analyses of other data.



I can confirm that this is a possibility.

Before workflows became available, I created the RforProteomics package that essentially provided one relatively large vignette to demonstrate a variety of applications of R/Bioconductor for mass spectrometry and proteomics. I think this has been a useful way to disseminate R and Bioconductor in these respective communities, but also lead to the confusion that it was that package that "did all the stuff", i.e. people saying that they were using RforProteomics to do a task that was described in the vignette. The RforProteomics vignette does explicitly call library at the beginning of each section and explained that the package was only a collection of analyses stemming from other packages, but that wasn't enough apparently.

Laurent


-Aaron

________________________________ From: Bioc-devel <bioc-devel-boun...@r-project.org <mailto:bioc-devel-boun...@r-project.org>> on behalf of Wolfgang Huber <wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de>> Sent: Thursday, 5 October 2017 8:26 AM To: bioc-devel@r-project.org <mailto:bioc-devel@r-project.org>

Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell workflow


I find `eval=FALSE` chunks not a good idea, since - they confuse users who only see the rendered HTML/PDF (where this flag is not shown) - they are not tested, so more prone to code rot.

I'd also like to object to the idea that proximity of a
`library` call to code that uses a package is somehow didactic.
It's actually a bad habit: the R interpreter does not care. The relevant package - can be mentioned in the narrative, - stated
in the code with the pkgname:: prefix. The latter is good
didactics to get people used to the idea of namespaces,
especially since there is an increasing frequency of name clashes
in CRAN, tidyverse, BioC (e.g. consider the various functions
named 'filter' and the obscure malbehaviors that can result from these).

Best wishes Wolfgang

On 04/10/2017 22:20, Turaga, Nitesh wrote:


Hi Aaron,


A work around solution maybe to, put all libraries in a “eval=FALSE” block in the r code chunk

```{r, eval=FALSE} library(scran) library(scater) ```

etc.


This way the users can see the library() calls in the vignette.

Best,

Nitesh

On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie <valerie.obench...@roswellpark.org> wrote:

Hi guys,

A little background on this vignette -> package conversion. The workflows were converted to package form because we want to integrate them into the nightly build system instead of supporting separate machines as we're now doing.

As part of this conversion, packages loaded in workflow
vignettes were moved to Depends in DESCRIPTION. This enables the
user to load a single package instead of many. Packages were
moved to Depends instead of Suggests (as is usually done with
software packages) because these vignette is the only thing
these workflow


packages have going - no defined classes or methods. This seemed a more tidy approach and the dependencies are listed in Depends for the user to see. This was my (maybe bad?) idea and Nitesh
was the messenger. If you feel the individual loading of packages
in the vignette is a key part of the instruction/learning we can leave them as is and list the packages in Suggests.



I should also mention that incorporating the workflows into the build system won't happen until after the release. At that time we'll move the repositories from svn to git and it's likely
we'll have to ask maintainers to abide by some time/space
guidelines. At that point the build machines will be building
software,


experimental data and workflows and resources aren't unlimited. When that time comes we'll update the workflow guidelines and contact maintainers.



Thanks. Valerie



On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:

yeah, that is super super useful to people. In my vignettes (granted, not workflows) I have a separate "Dependencies"
section which is basically a series of library() calls.

On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun <a...@wehi.edu.au

<mailto:a...@wehi.edu.au>><mailto:a...@wehi.edu.au <mailto:a...@wehi.edu.au>> wrote:



Dear Nitesh, list;


The library() calls in the simpleSingleCell workflow have been removed. Why is this? I find explicit library() calls to be
quite useful for readers of the compiled vignette, because it
makes it easier for them to determine the packages that are
required to adapt parts of the workflow for their own analyses.
If it doesn't hurt the build system, I would prefer to have these
library() calls in the vignette.


Cheers,


Aaron

[[alternative HTML version deleted]]


_______________________________________________ Bioc-devel@r-project.org

<mailto:Bioc-devel@r-project.org><mailto:Bioc-devel@r-project.org



<mailto:Bioc-devel@r-project.org>>
mailing list

https://stat.ethz.ch/mailman/listinfo/bioc-devel

<https://stat.ethz.ch/mailman/listinfo/bioc-devel>




[[alternative HTML version deleted]]


_______________________________________________ Bioc-devel@r-project.org

<mailto:Bioc-devel@r-project.org><mailto:Bioc-devel@r-project.org



<mailto:Bioc-devel@r-project.org>>
mailing list

https://stat.ethz.ch/mailman/listinfo/bioc-devel

<https://stat.ethz.ch/mailman/listinfo/bioc-devel>





This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is


prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.


[[alternative HTML version deleted]]


_______________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list

https://stat.ethz.ch/mailman/listinfo/bioc-devel

<https://stat.ethz.ch/mailman/listinfo/bioc-devel>


Bioc-devel Info Page - ETH

Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel

<https://stat.ethz.ch/mailman/listinfo/bioc-devel>> stat.ethz.ch <http://stat.ethz.ch> Your email address: Your name (optional): You may enter a privacy password below. This provides only mild security, but should prevent others from messing with ...






This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is


prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.



_______________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list

https://stat.ethz.ch/mailman/listinfo/bioc-devel

<https://stat.ethz.ch/mailman/listinfo/bioc-devel>


Bioc-devel Info Page - ETH

Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel

<https://stat.ethz.ch/mailman/listinfo/bioc-devel>> stat.ethz.ch <http://stat.ethz.ch> Your email address: Your name (optional): You may enter a privacy password below. This provides only mild security, but should prevent others from messing with ...







-- With thanks in advance- Wolfgang

------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany

wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de> http://www.huber.embl.de








-- With thanks in advance- Wolfgang

------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany

wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de> http://www.huber.embl.de

_______________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


--     With thanks in advance- Wolfgang

------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany

wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de> http://www.huber.embl.de

_______________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>



-- With thanks in advance- Wolfgang

------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany

wolfgang.hu...@embl.de http://www.huber.embl.de

_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or...{{dropped:2}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to