Hi guys, - this is a very interesting discussion, and thanks for sharing your ideas.
On a maybe relevant note, during my previous effort trying to create an alternative workflow auto-build system based on Docker containerization using my package liftr, I found some similar issues in declaring dependencies, or, dependency management in general. I think both methods (in the RMD or the workflow package DESCRIPTION) may have some potential drawbacks and advantages -- some of them are available in the comments of the following file: https://github.com/road2stat/dockflow/blob/master/src/2-containerize.R Hope this helps, -Nan On Fri, Oct 6, 2017 at 4:49 PM, Henrik Bengtsson <henrik.bengts...@gmail.com > wrote: > I haven't tried (= had to do it) myself, so I don't know exactly what > it takes, but you can configure this "ulimit" of number of open files, > e.g. instructions in https://stackoverflow.com/a/34645/1072091. I > suspect it requires admin rights, but I'm not sure - maybe this is > what goes on when you run it in different types terminals. > > About this open file/DLL limit: in src/main/Rdynload.c > (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/ > main/Rdynload.c#L173-L180) > there's the following comment/clarification: > > /* Note that it is likely that dlopen will use up at least one file > descriptor for each DLL loaded (it may load further dynamically > linked libraries), so we do not want to get close to the fd limit > (which may be as low as 256). By default, the maximum number of DLLs > that can be loaded is 100. When the fd limit is known, we allow > increasing the maximum number of DLLs via environment variable up to > 60% of the limit on open files, but to no more than 1000. g > */ > > I always thought that "as low as 256" was for some archaic system, > but, as Wolfgang points out, it's a relevant limit. Since 0.6*256 = > 153, this explains that the choice of the current default of a maximum > 100 DLLs is reasonable and requests to bump it up much higher may not > be feasible (not cross-platform). > > > Related to this - "Garbage collection of DLLs": > > I've implemented R.utils::gcDLLs() that "Identifies and removes > ["stray"] DLLs of packages already unloaded". This function will free > up DLL slots otherwise occupied by unloaded packages. I've used is > successfully in many places, e.g. trying to load and unload all my > installed packages in a single R session (don't ask why ;)). > > However, as argued by Karl Millar > (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html), > there is a risk that unregistering such DLLs may render the state of R > unstable because we cannot know for sure whether there are some > registered finalizers that rely on such DLLs that yet haven't been > called. R.utils::gcDLLs() forces the garbage collector to run prior > to unregistering DLLs, which should eliminate the risk for this > problem. As far as I understand the current R implementation, this > should be enough. On the other hand, I've been wrong before, I don't > know about future version of R, and it has only been tested so much. > Guaranteeing reentrancy of finalizers is really tricky. > > /Henrik > > On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber <wolfgang.hu...@embl.de> > wrote: > > Interesting! In iTerm2, I get > > $ ulimit -Sn > > 4864 > > > > and > > env R_MAX_NUM_DLLS=1000 R > > > > works, which means that on Mac it IS possible to have many more DLLs open > > than 100 if R is started in the right way. > > > > Wolfgang > > > > PS I meant OS X 10.12.6, too. SOrry for the typo. > > > > > > 6.10.17 14:50, Kasper Daniel Hansen scripsit: > >> > >> On OS X 10.12.6 (I don't think 10.12.16 exists), I get > >> > >> $ ulimit -Sn > >> 7168 > >> > >> Interestingly, this is because I use iTerm2 for my command line prompt. > >> If I do the same command in Terminal I get 256. If I start R inside of > >> Emacs I get 256 as well. I don't know anything about ulimit and how it > is > >> set, but that is a pretty start difference. > >> > >> Best, > >> Kasper > >> > >> > >> > >> On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.hu...@embl.de > >> <mailto:wolfgang.hu...@embl.de>> wrote: > >> > >> On Mac OSX 10.12.16: > >> $ ulimit -Sn > >> 256 > >> > >> so the maximum value of R_MAX_NUM_DLLS is 153 ... > >> > >> Wolfgang > >> > >> 5.10.17 23:02, Henrik Bengtsson scripsit: > >> > >> About the DLL limit: > >> > >> Just wanna make sure you're aware of "new" environment variable > >> R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push > >> the > >> current default limit of 100 open DLLs a bit higher. It can be > >> set in > >> .Renviron or before, e.g. > >> > >> $ R_MAX_NUM_DLLS=500 R > >> > >> This, of course, assumes that you can set it, which you might > not > >> be > >> able to do on build servers. Also, there is an upper limit > >> min(0.6*fd_limit,1000) that depends on the number of files you > can > >> have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 > >> I've > >> got: > >> > >> $ ulimit -Sn > >> 1024 > >> > >> so R_MAX_NUM_DLLS=614 is the maximum for me. > >> > >> /Henrik > >> > >> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber > >> <wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de>> wrote: > >> > >> > >> Breaking up long workflows into several smaller "modules" > >> each with a > >> clearly defined input and output is a good idea, certainly > >> for didactic & > >> maintenance reasons. > >> > >> It doesn't "solve" the DLL issue though, it only avoids it > >> (for now)... > >> > >> I believe you can use a Makefile for your vignettes > >> > >> (https://cran.r-project.org/doc/manuals/R-exts.html# > Writing-package-vignettes > >> > >> <https://cran.r-project.org/doc/manuals/R-exts.html# > Writing-package-vignettes>), > >> and this might be a good way of managing which depends on > >> which. For passing > >> along output/input, perhaps local .RData files are good > >> enough, perhaps some > >> wheel-reinventing can also be avoided by using > >> > >> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html > >> > >> <https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html > > > >> (haven't actually used it yet, though). > >> > >> Wolfgang > >> > >> > >> > >> 5.10.17 20:02, Aaron Lun scripsit: > >> > >> > >> This may relate to what I was thinking with respect to > >> solving the DLL > >> problem, by breaking up large workflows into modules > >> that can be executed in > >> separate R sessions. The same approach would also make > >> it easier to > >> associate package dependencies with specific parts of > >> the workflow. > >> > >> > >> In my particular situation, it is easy to break up the > >> workflow into > >> sections that can be executed completely independently. > >> However, I can also > >> imagine situations where dependencies on previous > >> objects, etc. make it > >> difficult to break up the workflow. If multiple files > >> are present in > >> vignettes/, can they be directed to execute in a > >> specific order, and would > >> output files from one vignette persist during the > >> execution of another? > >> > >> > >> -Aaron > >> > >> > >> ------------------------------------------------------------ > ------------ > >> *From:* Wolfgang Huber <wolfgang.hu...@embl.de > >> <mailto:wolfgang.hu...@embl.de>> > >> *Sent:* Thursday, 5 October 2017 6:23:47 PM > >> *To:* Laurent Gatto; Aaron Lun > >> *Cc:* bioc-devel@r-project.org > >> <mailto:bioc-devel@r-project.org> > >> > >> *Subject:* Re: [Bioc-devel] library() calls removed in > >> simpleSingleCell > >> workflow > >> > >> > >> I agree it is nice to be able to only load the packages > >> needed for a > >> certain section of a vignette and not the whole thing. > >> And that too many > >> `::` can make code look unwieldy (though some may > >> actually increase > >> readability). > >> > >> But relying on manually sprinkled in `library` calls > >> seems like a hack > >> prone to error. And there are always bound to be > >> dependencies that are > >> non-local, e.g. on general infrastructure like > >> SummarizedExperiment, > >> ggplot2, dplyr. > >> > >> So: do we need a way to computationally determine the > >> dependencies of a > >> vignette section, including highlighting/eliminating > >> potential name > >> clashes (b/c the warnings about masking emitted at > >> package loading are > >> easily ignored)? This seems like a straightforward > >> engineering task. > >> > >> Eventually with such code analysis we could get rid of > >> explicit > >> `library` calls altogether :) > >> > >> Wolfgang > >> > >> > >> > >> > >> > >> 5.10.17 08:53, Laurent Gatto scripsit: > >> > >> > >> > >> On 5 October 2017 00:11, Aaron Lun wrote: > >> > >> Here's another two cents from me: > >> > >> The explicit library() calls allow for easy > >> copy-pasting if people > >> only want to use/adapt a section of the > >> workflow. In such cases, > >> calling "library(simpleSingleCell)" could drag > >> in a lot of unnecessary > >> packages (e.g., which could hit the DLL limit). > >> Reading through the > >> text to figure out the requirements for each > >> code chunk seems like a > >> pain, and lots of "::" are unwieldy. > >> > >> More generally, the removal of individual > >> library() calls seems to > >> encourage the use of a single > >> "library(simpleSingleCell)" call at the > >> top of any user-developed custom analysis > >> scripts based on the > >> workflow. This seems conceptually odd to me - > >> the simpleSingleCell > >> package is simply a vehicle for the compiled > >> workflow, it shouldn't be > >> involved in analyses of other data. > >> > >> > >> > >> I can confirm that this is a possibility. > >> > >> Before workflows became available, I created the > >> RforProteomics package > >> that essentially provided one relatively large > >> vignette to demonstrate a > >> variety of applications of R/Bioconductor for mass > >> spectrometry and > >> proteomics. I think this has been a useful way to > >> disseminate R and > >> Bioconductor in these respective communities, but > >> also lead to the > >> confusion that it was that package that "did all the > >> stuff", i.e. people > >> saying that they were using RforProteomics to do a > >> task that was > >> described in the vignette. The RforProteomics > >> vignette does explicitly > >> call library at the beginning of each section and > >> explained that the > >> package was only a collection of analyses stemming > >> from other packages, > >> but that wasn't enough apparently. > >> > >> Laurent > >> > >> > >> -Aaron > >> > >> ________________________________ > >> From: Bioc-devel > >> <bioc-devel-boun...@r-project.org > >> <mailto:bioc-devel-boun...@r-project.org>> on > >> behalf of > >> Wolfgang Huber <wolfgang.hu...@embl.de > >> <mailto:wolfgang.hu...@embl.de>> > >> Sent: Thursday, 5 October 2017 8:26 AM > >> To: bioc-devel@r-project.org > >> <mailto:bioc-devel@r-project.org> > >> > >> Subject: Re: [Bioc-devel] library() calls > >> removed in simpleSingleCell > >> workflow > >> > >> > >> I find `eval=FALSE` chunks not a good idea, > since > >> - they confuse users who only see the rendered > >> HTML/PDF (where this flag > >> is not shown) > >> - they are not tested, so more prone to code > rot. > >> > >> I'd also like to object to the idea that > >> proximity of a `library` call > >> to code that uses a package is somehow didactic. > >> It's actually a bad > >> habit: the R interpreter does not care. The > >> relevant package > >> - can be mentioned in the narrative, > >> - stated in the code with the pkgname:: prefix. > >> The latter is good didactics to get people used > >> to the idea of > >> namespaces, especially since there is an > >> increasing frequency of name > >> clashes in CRAN, tidyverse, BioC (e.g. consider > >> the various functions > >> named 'filter' and the obscure malbehaviors that > >> can result from these). > >> > >> Best wishes > >> Wolfgang > >> > >> On 04/10/2017 22:20, Turaga, Nitesh wrote: > >> > >> > >> Hi Aaron, > >> > >> > >> A work around solution maybe to, put all > >> libraries in a “eval=FALSE” > >> block in the r code chunk > >> > >> ```{r, eval=FALSE} > >> library(scran) > >> library(scater) > >> ``` > >> > >> etc. > >> > >> > >> This way the users can see the library() > >> calls in the vignette. > >> > >> Best, > >> > >> Nitesh > >> > >> On Oct 4, 2017, at 4:14 PM, Obenchain, > >> Valerie > >> <valerie.obench...@roswellpark.org> > wrote: > >> > >> Hi guys, > >> > >> A little background on this vignette -> > >> package conversion. The > >> workflows were converted to package form > >> because we want to integrate them > >> into the nightly build system instead of > >> supporting separate machines as > >> we're now doing. > >> > >> As part of this conversion, packages > >> loaded in workflow vignettes were > >> moved to Depends in DESCRIPTION. This > >> enables the user to load a single > >> package instead of many. Packages were > >> moved to Depends instead of Suggests > >> (as is usually done with software > >> packages) because these vignette is the > >> only thing these workflow > >> > >> > >> packages have going - no defined classes or methods. > >> This seemed a more > >> tidy approach and the dependencies are listed in Depends > >> for the user to > >> see. This was my (maybe bad?) idea and Nitesh was the > >> messenger. If you feel > >> the individual loading of packages in the vignette is a > >> key part of the > >> instruction/learning we can leave them as is and list > >> the packages in > >> Suggests. > >> > >> > >> > >> I should also mention that incorporating > >> the workflows into the build > >> system won't happen until after the > >> release. At that time we'll move the > >> repositories from svn to git and it's > >> likely we'll have to ask maintainers > >> to abide by some time/space guidelines. > >> At that point the build machines > >> will be building software, > >> > >> > >> experimental data and workflows and resources aren't > >> unlimited. When that > >> time comes we'll update the workflow guidelines and > >> contact maintainers. > >> > >> > >> > >> Thanks. > >> Valerie > >> > >> > >> > >> On 10/04/2017 12:27 PM, Kasper Daniel > >> Hansen wrote: > >> > >> yeah, that is super super useful to > >> people. In my vignettes (granted, > >> not > >> workflows) I have a separate > >> "Dependencies" section which is > basically > >> a > >> series of library() calls. > >> > >> On Wed, Oct 4, 2017 at 3:18 PM, Aaron > Lun > >> <a...@wehi.edu.au > >> > >> <mailto:a...@wehi.edu.au>><mailto:a...@wehi.edu.au > >> <mailto:a...@wehi.edu.au>> wrote: > >> > >> > >> > >> Dear Nitesh, list; > >> > >> > >> The library() calls in the > >> simpleSingleCell workflow have been > >> removed. > >> Why is this? I find explicit library() > >> calls to be quite useful for > >> readers > >> of the compiled vignette, because it > >> makes it easier for them to > >> determine > >> the packages that are required to adapt > >> parts of the workflow for > >> their own > >> analyses. If it doesn't hurt the build > >> system, I would prefer to have > >> these > >> library() calls in the vignette. > >> > >> > >> Cheers, > >> > >> > >> Aaron > >> > >> [[alternative HTML version > >> deleted]] > >> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org > >> > >> <mailto:Bioc-devel@r-project.org><mailto:Bioc-devel@r-project.org > >> <mailto:Bioc-devel@r-project.org>> > >> mailing list > >> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > >> > >> > >> > >> [[alternative HTML version > >> deleted]] > >> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org > >> > >> <mailto:Bioc-devel@r-project.org><mailto:Bioc-devel@r-project.org > >> <mailto:Bioc-devel@r-project.org>> > >> mailing list > >> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > >> > >> > >> > >> > >> This email message may contain legally > >> privileged and/or confidential > >> information. If you are not the > >> intended recipient(s), or the employee > or > >> agent responsible for the delivery of > >> this message to the intended > >> recipient(s), you are hereby notified > >> that any disclosure, copying, > >> distribution, or use of this email > >> message is > >> > >> > >> prohibited. If you have received this message in error, > >> please notify the > >> sender immediately by e-mail and delete this email > >> message from your > >> computer. Thank you. > >> > >> > >> [[alternative HTML version > >> deleted]] > >> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org > >> <mailto:Bioc-devel@r-project.org> > >> mailing list > >> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > >> > >> Bioc-devel Info Page - ETH > >> > >> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> > >> stat.ethz.ch <http://stat.ethz.ch> > >> Your email address: Your name (optional): You > >> may enter a privacy > >> password below. This provides only mild > >> security, but should prevent others > >> from messing with ... > >> > >> > >> > >> > >> > >> > >> This email message may contain legally > >> privileged and/or confidential > >> information. If you are not the intended > >> recipient(s), or the employee or > >> agent responsible for the delivery of this > >> message to the intended > >> recipient(s), you are hereby notified that > >> any disclosure, copying, > >> distribution, or use of this email message > is > >> > >> > >> prohibited. If you have received this message in error, > >> please notify the > >> sender immediately by e-mail and delete this email > >> message from your > >> computer. Thank you. > >> > >> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org > >> <mailto:Bioc-devel@r-project.org> mailing > list > >> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > >> > >> Bioc-devel Info Page - ETH > >> > >> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> > >> stat.ethz.ch <http://stat.ethz.ch> > >> Your email address: Your name (optional): You > >> may enter a privacy > >> password below. This provides only mild > >> security, but should prevent others > >> from messing with ... > >> > >> > >> > >> > >> > >> > >> > >> -- > >> With thanks in advance- > >> Wolfgang > >> > >> ------- > >> Wolfgang Huber > >> Principal Investigator, EMBL Senior Scientist > >> European Molecular Biology Laboratory (EMBL) > >> Heidelberg, Germany > >> > >> wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de> > >> http://www.huber.embl.de > >> > >> > >> > >> > >> > >> > >> > >> > >> -- > >> With thanks in advance- > >> Wolfgang > >> > >> ------- > >> Wolfgang Huber > >> Principal Investigator, EMBL Senior Scientist > >> European Molecular Biology Laboratory (EMBL) > >> Heidelberg, Germany > >> > >> wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de> > >> http://www.huber.embl.de > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > >> mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > >> > >> -- With thanks in advance- > >> Wolfgang > >> > >> ------- > >> Wolfgang Huber > >> Principal Investigator, EMBL Senior Scientist > >> European Molecular Biology Laboratory (EMBL) > >> Heidelberg, Germany > >> > >> wolfgang.hu...@embl.de <mailto:wolfgang.hu...@embl.de> > >> http://www.huber.embl.de > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing > >> list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > >> > > > > -- > > With thanks in advance- > > Wolfgang > > > > ------- > > Wolfgang Huber > > Principal Investigator, EMBL Senior Scientist > > European Molecular Biology Laboratory (EMBL) > > Heidelberg, Germany > > > > wolfgang.hu...@embl.de > > http://www.huber.embl.de > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- https://nanx.me [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel