I would personally like something like an Android/iOS permissions required/requested manifest document describing what the pkg needs with R doing what it can to enforce said permissions. R would be breaking some ground in this space, but it does that regularly in many respects. Yes, I know I just 10x++ the scope.
I'd support just this flag, tho. Anything to increase transparency and safety. On Mon, Sep 26, 2022 at 6:22 PM Simon Urbanek <simon.urba...@r-project.org> wrote: > > BTW: It is a good question whether packages that require internet access in > order to function at all should be flagged as such so they can be removed > from server installations. Let's say if a package provides an API for > retrieving stock quotes online and it's all it does then perhaps it does make > sense to exclude it. It would be pointless to appease the load check just to > not be able to perform the function it was designed for... > > Cheers, > Simon > > > > On 27/09/2022, at 11:11 AM, Simon Urbanek <simon.urba...@r-project.org> > > wrote: > > > > > > > >> On 27/09/2022, at 11:02 AM, Gabriel Becker <gabembec...@gmail.com> wrote: > >> > >> For the record, the only things switchr (my package) is doing internet > >> wise should be hitting the bioconductor config file > >> (http://bioconductor.org/config.yaml) so that it knows the things it need > >> to know about Bioc repos/versions/etc (at load time, actually, not install > >> time, but since install does a test load, those are essentially the same). > >> > >> I have fallback behavior for when the file can't be read, so there > >> shouldn't be any actual build breakages/install breakages I don't think, > >> but the check does happen. > >> > > > > $ sandbox-exec -n no-network R CMD INSTALL switchr_0.14.5.tar.gz > > [...] > > ** testing if installed package can be loaded from final location > > Error in readLines(con) : > > cannot open the connection to 'http://bioconductor.org/config.yaml' > > Calls: <Anonymous> ... getBiocDevelVr -> getBiocYaml -> inet_handlers -> > > readLines > > Execution halted > > ERROR: loading failed > > > > So, yes, it does break. You should recover from the error and use a > > fall-back file that you ship. > > > > Cheers, > > Simon > > > > > >> Advice on what to do for the above use case that is better practice is > >> welcome. > >> > >> ~G > >> > >> On Mon, Sep 26, 2022 at 2:40 PM Simon Urbanek > >> <simon.urba...@r-project.org> wrote: > >> > >> > >>> On 27/09/2022, at 10:21 AM, Iñaki Ucar <iu...@fedoraproject.org> wrote: > >>> > >>> On Mon, 26 Sept 2022 at 23:07, Simon Urbanek > >>> <simon.urba...@r-project.org> wrote: > >>>> > >>>> Iñaki, > >>>> > >>>> I'm not sure I understand - system dependencies are an entirely > >>>> different topic and I would argue a far more important one (very happy > >>>> to start a discussion about that), but that has nothing to do with > >>>> declaring downloads. I assumed your question was about large files in > >>>> packages which packages avoid to ship and download instead so declaring > >>>> them would be useful. > >>> > >>> Exactly. Maybe there's a misunderstanding, because I didn't talk about > >>> system dependencies (alas there are packages that try to download things > >>> that are declared as system dependencies, as Gabe noted). :) > >>> > >> > >> > >> Ok, understood. I would like to tackle those as well, but let's start that > >> conversation in a few weeks when I have a lot more time. > >> > >> > >>>> And for that, the obvious answer is they shouldn't do that - if a > >>>> package needs a file to run, it should include it. So an easy solution > >>>> is to disallow it. > >>> > >>> Then we completely agree. My proposal about declaring additional sources > >>> was because, given that so many packages do this, I thought that I would > >>> find a strong opposition to this. But if R Core / CRAN is ok with just > >>> limiting net access at install time, then that's perfect to me. :) > >>> > >> > >> Yes we do agree :). I started looking at your list, and so far those seem > >> simply bugs or design deficiencies in the packages (and outright policy > >> violations). I think the only reason they exist is that it doesn't get > >> detected in CRAN incoming, it's certainly not intentional. > >> > >> Cheers, > >> Simon > >> > >> > >>> Iñaki > >>> > >>>> But so far all examples where just (ab)use of downloads for binary > >>>> dependencies which is an entirely different issue that needs a different > >>>> solution (in a naive way declaring such dependencies, but we know it's > >>>> not that simple - and download URLs don't help there). > >>>> > >>>> Cheers, > >>>> Simon > >>>> > >>>> > >>>>> On 27/09/2022, at 8:25 AM, Ucar <iu...@fedoraproject.org> wrote: > >>>>> > >>>>> On Sat, 24 Sept 2022 at 01:55, Simon Urbanek > >>>>> <simon.urba...@r-project.org> wrote: > >>>>>> > >>>>>> Iñaki, > >>>>>> > >>>>>> I fully agree, this a very common issue since vast majority of server > >>>>>> deployments I have encountered don't allow internet access. In > >>>>>> practice this means that such packages are effectively banned. > >>>>>> > >>>>>> I would argue that not even (1) or (2) are really an issue, because in > >>>>>> fact the CRAN policy doesn't impose any absolute limits on size, it > >>>>>> only states that the package should be "of minimum necessary size" > >>>>>> which means it shouldn't waste space. If there is no way to reduce the > >>>>>> size without impacting functionality, it's perfectly fine. > >>>>> > >>>>> "Packages should be of the minimum necessary size" is subject to > >>>>> interpretation. And in practice, there is an issue with e.g. packages > >>>>> that "bundle" big third-party libraries. There are also packages that > >>>>> require downloading precompiled code, JARs... at installation time. > >>>>> > >>>>>> That said, there are exceptions such as very large datasets (e.g., as > >>>>>> distributed by Bioconductor) which are orders of magnitude larger than > >>>>>> what is sustainable. I agree that it would be nice to have a mechanism > >>>>>> for specifying such sources. So yes, I like the idea, but I'd like to > >>>>>> see more real use cases to justify the effort. > >>>>> > >>>>> "More real use cases" like in "more use cases" or like in "the > >>>>> previous ones are not real ones"? :) > >>>>> > >>>>>> The issue with any online downloads, though, is that there is no > >>>>>> guarantee of availability - which is real issue for reproducibility. > >>>>>> So one could argue that if such external sources are required then > >>>>>> they should be on a well-defined, independent, permanent storage such > >>>>>> as Zenodo. This could be a matter of policy as opposed to the > >>>>>> technical side above which would be adding such support to R CMD > >>>>>> INSTALL. > >>>>> > >>>>> Not necessarily. If the package declares the additional sources in the > >>>>> DESCRIPTION (probably with hashes), that's a big improvement over the > >>>>> current state of things, in which basically we don't know what the > >>>>> package tries download, then it may fail, and finally there's no > >>>>> guarantee that it's what the author intended in the first place. > >>>>> > >>>>> But on top of this, R could add a CMD to download those, and then some > >>>>> lookaside storage could be used on CRAN. This is e.g. how RPM > >>>>> packaging works: the spec declares all the sources, they are > >>>>> downloaded once, hashed and stored in a lookaside cache. Then package > >>>>> building doesn't need general Internet connectivity, just access to > >>>>> the cache. > >>>>> > >>>>> Iñaki > >>>>> > >>>>>> > >>>>>> Cheers, > >>>>>> Simon > >>>>>> > >>>>>> > >>>>>>> On Sep 24, 2022, at 3:22 AM, Iñaki Ucar <iu...@fedoraproject.org> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I'd like to open this debate here, because IMO this is a big issue. > >>>>>>> Many packages do this for various reasons, some more legitimate than > >>>>>>> others, but I think that this shouldn't be allowed, because it > >>>>>>> basically means that installation fails in a machine without Internet > >>>>>>> access (which happens e.g. in Linux distro builders for security > >>>>>>> reasons). > >>>>>>> > >>>>>>> Now, what if connection is suppressed during package load? There are > >>>>>>> basically three use cases out there: > >>>>>>> > >>>>>>> (1) The package requires additional files for the installation (e.g. > >>>>>>> the source code of an external library) that cannot be bundled into > >>>>>>> the package due to CRAN restrictions (size). > >>>>>>> (2) The package requires additional files for using it (e.g., > >>>>>>> datasets, a JAR...) that cannot be bundled into the package due to > >>>>>>> CRAN restrictions (size). > >>>>>>> (3) Other spurious reasons (e.g. the maintainer decided that package > >>>>>>> load was a good place to check an online service availability, etc.). > >>>>>>> > >>>>>>> Again IMO, (3) shouldn't be allowed in any case; (2) should be a > >>>>>>> separate function that the user actively calls to download the files, > >>>>>>> and those files should be placed into the user dir, and (3) is the > >>>>>>> only legitimate use, but then other mechanism should be provided to > >>>>>>> avoid connections during package load. > >>>>>>> > >>>>>>> My proposal to support (3) would be to add a new field in the > >>>>>>> DESCRIPTION, "Additional_sources", which would be a comma separated > >>>>>>> list of additional resources to download during R CMD INSTALL. Those > >>>>>>> sources would be downloaded by R CMD INSTALL if not provided via an > >>>>>>> option (to support offline installations), and would be placed in a > >>>>>>> predefined place for the package to find and configure them (via an > >>>>>>> environment variable or in a predefined subdirectory). > >>>>>>> > >>>>>>> This proposal has several advantages. Apart from the obvious one > >>>>>>> (Internet access during package load can be limited without losing > >>>>>>> current functionalities), it gives more visibility to the resources > >>>>>>> that packages are using during the installation phase, and thus makes > >>>>>>> those installations more reproducible and more secure. > >>>>>>> > >>>>>>> Best, > >>>>>>> -- > >>>>>>> Iñaki Úcar > >>>>>>> > >>>>>>> ______________________________________________ > >>>>>>> R-devel@r-project.org mailing list > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Iñaki Úcar > >>>> > >>> > >>> > >>> -- > >>> Iñaki Úcar > >>> > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel