I would personally like something like an Android/iOS permissions
required/requested manifest document describing what the pkg needs
with R doing what it can to enforce said permissions. R would be
breaking some ground in this space, but it does that regularly in many
respects. Yes, I know I just 10x++ the scope.

I'd support just this flag, tho. Anything to increase transparency and safety.

On Mon, Sep 26, 2022 at 6:22 PM Simon Urbanek
<simon.urba...@r-project.org> wrote:
>
> BTW: It is a good question whether packages that require internet access in 
> order to function at all should be flagged as such so they can be removed 
> from server installations. Let's say if a package provides an API for 
> retrieving stock quotes online and it's all it does then perhaps it does make 
> sense to exclude it. It would be pointless to appease the load check just to 
> not be able to perform the function it was designed for...
>
> Cheers,
> Simon
>
>
> > On 27/09/2022, at 11:11 AM, Simon Urbanek <simon.urba...@r-project.org> 
> > wrote:
> >
> >
> >
> >> On 27/09/2022, at 11:02 AM, Gabriel Becker <gabembec...@gmail.com> wrote:
> >>
> >> For the record, the only things switchr (my package) is doing internet 
> >> wise should be hitting the bioconductor config file 
> >> (http://bioconductor.org/config.yaml) so that it knows the things it need 
> >> to know about Bioc repos/versions/etc (at load time, actually, not install 
> >> time, but since install does a test load, those are essentially the same).
> >>
> >> I have fallback behavior for when the file can't be read, so there 
> >> shouldn't be any actual build breakages/install breakages I don't think, 
> >> but the check does happen.
> >>
> >
> > $ sandbox-exec -n no-network R CMD INSTALL switchr_0.14.5.tar.gz
> > [...]
> > ** testing if installed package can be loaded from final location
> > Error in readLines(con) :
> >  cannot open the connection to 'http://bioconductor.org/config.yaml'
> > Calls: <Anonymous> ... getBiocDevelVr -> getBiocYaml -> inet_handlers -> 
> > readLines
> > Execution halted
> > ERROR: loading failed
> >
> > So, yes, it does break. You should recover from the error and use a 
> > fall-back file that you ship.
> >
> > Cheers,
> > Simon
> >
> >
> >> Advice on what to do for the above use case that is better practice is 
> >> welcome.
> >>
> >> ~G
> >>
> >> On Mon, Sep 26, 2022 at 2:40 PM Simon Urbanek 
> >> <simon.urba...@r-project.org> wrote:
> >>
> >>
> >>> On 27/09/2022, at 10:21 AM, Iñaki Ucar <iu...@fedoraproject.org> wrote:
> >>>
> >>> On Mon, 26 Sept 2022 at 23:07, Simon Urbanek
> >>> <simon.urba...@r-project.org> wrote:
> >>>>
> >>>> Iñaki,
> >>>>
> >>>> I'm not sure I understand - system dependencies are an entirely 
> >>>> different topic and I would argue a far more important one (very happy 
> >>>> to start a discussion about that), but that has nothing to do with 
> >>>> declaring downloads. I assumed your question was about large files in 
> >>>> packages which packages avoid to ship and download instead so declaring 
> >>>> them would be useful.
> >>>
> >>> Exactly. Maybe there's a misunderstanding, because I didn't talk about 
> >>> system dependencies (alas there are packages that try to download things 
> >>> that are declared as system dependencies, as Gabe noted). :)
> >>>
> >>
> >>
> >> Ok, understood. I would like to tackle those as well, but let's start that 
> >> conversation in a few weeks when I have a lot more time.
> >>
> >>
> >>>> And for that, the obvious answer is they shouldn't do that - if a 
> >>>> package needs a file to run, it should include it. So an easy solution 
> >>>> is to disallow it.
> >>>
> >>> Then we completely agree. My proposal about declaring additional sources 
> >>> was because, given that so many packages do this, I thought that I would 
> >>> find a strong opposition to this. But if R Core / CRAN is ok with just 
> >>> limiting net access at install time, then that's perfect to me. :)
> >>>
> >>
> >> Yes we do agree :). I started looking at your list, and so far those seem 
> >> simply bugs or design deficiencies in the packages (and outright policy 
> >> violations). I think the only reason they exist is that it doesn't get 
> >> detected in CRAN incoming, it's certainly not intentional.
> >>
> >> Cheers,
> >> Simon
> >>
> >>
> >>> Iñaki
> >>>
> >>>> But so far all examples where just (ab)use of downloads for binary 
> >>>> dependencies which is an entirely different issue that needs a different 
> >>>> solution (in a naive way declaring such dependencies, but we know it's 
> >>>> not that simple - and download URLs don't help there).
> >>>>
> >>>> Cheers,
> >>>> Simon
> >>>>
> >>>>
> >>>>> On 27/09/2022, at 8:25 AM,  Ucar <iu...@fedoraproject.org> wrote:
> >>>>>
> >>>>> On Sat, 24 Sept 2022 at 01:55, Simon Urbanek
> >>>>> <simon.urba...@r-project.org> wrote:
> >>>>>>
> >>>>>> Iñaki,
> >>>>>>
> >>>>>> I fully agree, this a very common issue since vast majority of server 
> >>>>>> deployments I have encountered don't allow internet access. In 
> >>>>>> practice this means that such packages are effectively banned.
> >>>>>>
> >>>>>> I would argue that not even (1) or (2) are really an issue, because in 
> >>>>>> fact the CRAN policy doesn't impose any absolute limits on size, it 
> >>>>>> only states that the package should be "of minimum necessary size" 
> >>>>>> which means it shouldn't waste space. If there is no way to reduce the 
> >>>>>> size without impacting functionality, it's perfectly fine.
> >>>>>
> >>>>> "Packages should be of the minimum necessary size" is subject to
> >>>>> interpretation. And in practice, there is an issue with e.g. packages
> >>>>> that "bundle" big third-party libraries. There are also packages that
> >>>>> require downloading precompiled code, JARs... at installation time.
> >>>>>
> >>>>>> That said, there are exceptions such as very large datasets (e.g., as 
> >>>>>> distributed by Bioconductor) which are orders of magnitude larger than 
> >>>>>> what is sustainable. I agree that it would be nice to have a mechanism 
> >>>>>> for specifying such sources. So yes, I like the idea, but I'd like to 
> >>>>>> see more real use cases to justify the effort.
> >>>>>
> >>>>> "More real use cases" like in "more use cases" or like in "the
> >>>>> previous ones are not real ones"? :)
> >>>>>
> >>>>>> The issue with any online downloads, though, is that there is no 
> >>>>>> guarantee of availability - which is real issue for reproducibility. 
> >>>>>> So one could argue that if such external sources are required then 
> >>>>>> they should be on a well-defined, independent, permanent storage such 
> >>>>>> as Zenodo. This could be a matter of policy as opposed to the 
> >>>>>> technical side above which would be adding such support to R CMD 
> >>>>>> INSTALL.
> >>>>>
> >>>>> Not necessarily. If the package declares the additional sources in the
> >>>>> DESCRIPTION (probably with hashes), that's a big improvement over the
> >>>>> current state of things, in which basically we don't know what the
> >>>>> package tries download, then it may fail, and finally there's no
> >>>>> guarantee that it's what the author intended in the first place.
> >>>>>
> >>>>> But on top of this, R could add a CMD to download those, and then some
> >>>>> lookaside storage could be used on CRAN. This is e.g. how RPM
> >>>>> packaging works: the spec declares all the sources, they are
> >>>>> downloaded once, hashed and stored in a lookaside cache. Then package
> >>>>> building doesn't need general Internet connectivity, just access to
> >>>>> the cache.
> >>>>>
> >>>>> Iñaki
> >>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Simon
> >>>>>>
> >>>>>>
> >>>>>>> On Sep 24, 2022, at 3:22 AM, Iñaki Ucar <iu...@fedoraproject.org> 
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to open this debate here, because IMO this is a big issue.
> >>>>>>> Many packages do this for various reasons, some more legitimate than
> >>>>>>> others, but I think that this shouldn't be allowed, because it
> >>>>>>> basically means that installation fails in a machine without Internet
> >>>>>>> access (which happens e.g. in Linux distro builders for security
> >>>>>>> reasons).
> >>>>>>>
> >>>>>>> Now, what if connection is suppressed during package load? There are
> >>>>>>> basically three use cases out there:
> >>>>>>>
> >>>>>>> (1) The package requires additional files for the installation (e.g.
> >>>>>>> the source code of an external library) that cannot be bundled into
> >>>>>>> the package due to CRAN restrictions (size).
> >>>>>>> (2) The package requires additional files for using it (e.g.,
> >>>>>>> datasets, a JAR...) that cannot be bundled into the package due to
> >>>>>>> CRAN restrictions (size).
> >>>>>>> (3) Other spurious reasons (e.g. the maintainer decided that package
> >>>>>>> load was a good place to check an online service availability, etc.).
> >>>>>>>
> >>>>>>> Again IMO, (3) shouldn't be allowed in any case; (2) should be a
> >>>>>>> separate function that the user actively calls to download the files,
> >>>>>>> and those files should be placed into the user dir, and (3) is the
> >>>>>>> only legitimate use, but then other mechanism should be provided to
> >>>>>>> avoid connections during package load.
> >>>>>>>
> >>>>>>> My proposal to support (3) would be to add a new field in the
> >>>>>>> DESCRIPTION, "Additional_sources", which would be a comma separated
> >>>>>>> list of additional resources to download during R CMD INSTALL. Those
> >>>>>>> sources would be downloaded by R CMD INSTALL if not provided via an
> >>>>>>> option (to support offline installations), and would be placed in a
> >>>>>>> predefined place for the package to find and configure them (via an
> >>>>>>> environment variable or in a predefined subdirectory).
> >>>>>>>
> >>>>>>> This proposal has several advantages. Apart from the obvious one
> >>>>>>> (Internet access during package load can be limited without losing
> >>>>>>> current functionalities), it gives more visibility to the resources
> >>>>>>> that packages are using during the installation phase, and thus makes
> >>>>>>> those installations more reproducible and more secure.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> --
> >>>>>>> Iñaki Úcar
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-devel@r-project.org mailing list
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Iñaki Úcar
> >>>>
> >>>
> >>>
> >>> --
> >>> Iñaki Úcar
> >>>
> >>
> >> ______________________________________________
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to