Re: [R-pkg-devel] Order of repo access from options("repos")
If your company is going to ensure that a package called pkgCompany is only looked for in a local repo by installl.packages() and friends, I think in your cpmpany wide R installation you can set the option "available_packages_filters" to a self written one that is exclusively reporting results from the local repo for 'pkgCompany'. Of course, this is not safe and can be overwritten by e user etc., but it needs quite some effort to trick people this way in using a malicious package from another repo. It would be simpler for attackers to persuade people to install the malicious software directly, I believe. Best, Uwe Ligges On 02.04.2024 16:05, Jan van der Laan wrote: Interesting. That would also mean that putting a company repo first does not protect against dependency confusion attacks (people intentionally uploading packages with the same name as company internal packages on CRAN; https://arstechnica.com/information-technology/2021/02/supply-chain-attack-that-fooled-apple-and-microsoft-is-attracting-copycats/) Jan On 01-04-2024 02:07, Greg Hunt wrote: Martin, Dirk, Kevin, Thanks for your help. To summarise: the order of access is undefined, and every repo URL is accessed. I'm working in an environment where "known-good" is more important than "latest", so what follows is an explanation of the problem space from my perspective. What I am experimenting with is pinning down the versions of the packages that a moderately complex solution is built against using a combination of an internal repository of cached packages (internally written packages, our own hopefully transient copies of packages archived from CRAN, packages live on CRAN, and packages present in both Github and CRAN which we build and cache locally) and a proxy that separately populates that cache in specific build processes by intercepting requests to CRAN. I'd like to use the base R function if possible and I want to let the version numbers in the dependencies float because a) we do need to maintain approximate currency in what versions of packages we use and b) I have no business monkeying around with third party's dependencies. Renv looks helpful but has some assumptions about disk access to its cache that I'd rather avoid by running an internal repo. The team is spread around the world, so shared cache volumes are not a great idea. The business with the multiple repo addresses is one approach to working around Docker's inability to understand that people need to access the Docker host's ports from inside a container or a build, and that the current Docker treatment of the host's internal IP is far from transparent (I have scripts that run both inside and outside of Docker containers and they used to be able to work out for themselves what environment they run in, thats got harder lately). That led down a path in which one set of addresses did not reject connection attempts, making each package installation (and there are hundreds) take some number of minutes for the connections to time out. Thankfully I don't actually have to deal with that. We have had a few cases where our dependencies have been archived from CRAN and we have maintained our own copy for a period of days to months, a period in which we do not know what the next package version number is. It would be convenient to not have to think about that - a deterministic, terminating search of a sequence of repos looked like a nice idea for that, but I may have to do something different. There was a recent case where a package made a breaking change in its interface in a release (not version) update that broke another package we depend on. It would be nice to be able to temporarily pin that package at its previous version (without updating the source of the third party package that depends on it) to preserve our own build-ability while those packages sort themselves out. There is one case where a pull request for a CRAN-hosted package was verbally accepted but never actioned so we have our own forked version of a CRAN-hosted package which I need to decide what to do with one day soon. Another case where the package version number is different in CRAN from the one we want. We have a dependency on a package that we build from a Git repo but which is also present in CRAN. I don't want to be dependent on the maintainers keeping the package version in the Git copy of the DESCRIPTION file higher than the version in CRAN. Ideally I'd like to build and push to the internal repo and not have to think about it after that. Same issue as before arises, as it stands today I have to either worry about, and probably edit, the version number in the build or manage the cache population process so the internal package instance is added after any CRAN-sourced dependencies and make sure that the public CRAN instances are not accessed in the build. All of these problems are soluble by special-casing the affected installs,
Re: [R-pkg-devel] Order of repo access from options("repos")
On 02.04.2024 14:07, Dirk Eddelbuettel wrote: On 1 April 2024 at 17:44, Uwe Ligges wrote: | Untested: | | install.packages() calls available.packages() to find out which packages | are available - and passes a "filters" argument if supplied. | That can be a user defined filter. It should be possible to write a user | defined filter which prefers the packages in your local repo. Intriguing. Presumably that would work for update.packages() too? Yes. I think so. Best, Uwe (We actually have a use case at work, and as one way out I created another side-repo to place a package with an incremented version number so it would 'win' on hightest version; this is due to some non-trivial issues with the underlying dependencies.) Dirk __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Order of repo access from options("repos")
Jan, Thats only the case if you want to allow later version numbers to override the versions in the internal repository, the "known-good" is more important than "latest" point above. Having a defined set of dependencies while still maintaining currency is a difficult problem. Always fetching dependencies from a public source is a very bad idea (which is why I am looking at these issues), but not doing it accumulates future costs as interfaces and sets of bugs evolve and need to be remediated. Those future costs can become very large indeed in a large system. Compounding the problem, CRAN caching is not supported universally by commercial infrastructure. I think Artifactory and Nexus do it, the AWS and Azure offerings don't. Greg On Wed, 3 Apr 2024 at 01:05, Jan van der Laan wrote: > Interesting. That would also mean that putting a company repo first does > not protect against dependency confusion attacks (people intentionally > uploading packages with the same name as company internal packages on > CRAN; > > https://arstechnica.com/information-technology/2021/02/supply-chain-attack-that-fooled-apple-and-microsoft-is-attracting-copycats/) > > > > Jan > > > > On 01-04-2024 02:07, Greg Hunt wrote: > > Martin, Dirk, Kevin, > > Thanks for your help. To summarise: the order of access is undefined, > and > > every repo URL is accessed. I'm working in an environment > > where "known-good" is more important than "latest", so what follows is an > > explanation of the problem space from my perspective. > > > > What I am experimenting with is pinning down the versions of the packages > > that a moderately complex solution is built against using a combination > of > > an internal repository of cached packages (internally written packages, > our > > own hopefully transient copies of packages archived from CRAN, > > packages live on CRAN, and packages present in both Github and CRAN which > > we build and cache locally) and a proxy that separately populates that > > cache in specific build processes by intercepting requests to CRAN. I'd > > like to use the base R function if possible and I want to let the version > > numbers in the dependencies float because a) we do need to maintain > > approximate currency in what versions of packages we use and b) I have no > > business monkeying around with third party's dependencies. Renv looks > > helpful but has some assumptions about disk access to its cache that I'd > > rather avoid by running an internal repo. The team is spread around the > > world, so shared cache volumes are not a great idea. > > > > The business with the multiple repo addresses is one approach to working > > around Docker's inability to understand that people need to access the > > Docker host's ports from inside a container or a build, and that the > > current Docker treatment of the host's internal IP is far from > transparent > > (I have scripts that run both inside and outside of Docker containers and > > they used to be able to work out for themselves what environment they run > > in, thats got harder lately). That led down a path in which one set of > > addresses did not reject connection attempts, making each package > > installation (and there are hundreds) take some number of minutes for the > > connections to time out. Thankfully I don't actually have to deal with > > that. > > > > We have had a few cases where our dependencies have been archived from > CRAN > > and we have maintained our own copy for a period of days to months, a > > period in which we do not know what the next package version number is. > It > > would be convenient to not have to think about that - a deterministic, > > terminating search of a sequence of repos looked like a nice idea for > that, > > but I may have to do something different. > > > > There was a recent case where a package made a breaking change in its > > interface in a release (not version) update that broke another package we > > depend on. It would be nice to be able to temporarily pin that package > at > > its previous version (without updating the source of the third party > > package that depends on it) to preserve our own build-ability while those > > packages sort themselves out. > > > > There is one case where a pull request for a CRAN-hosted package was > > verbally accepted but never actioned so we have our own forked version > of a > > CRAN-hosted package which I need to decide what to do with one day soon. > > Another case where the package version number is different in CRAN from > the > > one we want. > > > > We have a dependency on a package that we build from a Git repo but which > > is also present in CRAN. I don't want to be dependent on the maintainers > > keeping the package version in the Git copy of the DESCRIPTION file > higher > > than the version in CRAN. Ideally I'd like to build and push to the > > internal repo and not have to think about it after that. Same issue as > > before arises, as it stands today I have
Re: [R-pkg-devel] Order of repo access from options("repos")
Interesting. That would also mean that putting a company repo first does not protect against dependency confusion attacks (people intentionally uploading packages with the same name as company internal packages on CRAN; https://arstechnica.com/information-technology/2021/02/supply-chain-attack-that-fooled-apple-and-microsoft-is-attracting-copycats/) Jan On 01-04-2024 02:07, Greg Hunt wrote: Martin, Dirk, Kevin, Thanks for your help. To summarise: the order of access is undefined, and every repo URL is accessed. I'm working in an environment where "known-good" is more important than "latest", so what follows is an explanation of the problem space from my perspective. What I am experimenting with is pinning down the versions of the packages that a moderately complex solution is built against using a combination of an internal repository of cached packages (internally written packages, our own hopefully transient copies of packages archived from CRAN, packages live on CRAN, and packages present in both Github and CRAN which we build and cache locally) and a proxy that separately populates that cache in specific build processes by intercepting requests to CRAN. I'd like to use the base R function if possible and I want to let the version numbers in the dependencies float because a) we do need to maintain approximate currency in what versions of packages we use and b) I have no business monkeying around with third party's dependencies. Renv looks helpful but has some assumptions about disk access to its cache that I'd rather avoid by running an internal repo. The team is spread around the world, so shared cache volumes are not a great idea. The business with the multiple repo addresses is one approach to working around Docker's inability to understand that people need to access the Docker host's ports from inside a container or a build, and that the current Docker treatment of the host's internal IP is far from transparent (I have scripts that run both inside and outside of Docker containers and they used to be able to work out for themselves what environment they run in, thats got harder lately). That led down a path in which one set of addresses did not reject connection attempts, making each package installation (and there are hundreds) take some number of minutes for the connections to time out. Thankfully I don't actually have to deal with that. We have had a few cases where our dependencies have been archived from CRAN and we have maintained our own copy for a period of days to months, a period in which we do not know what the next package version number is. It would be convenient to not have to think about that - a deterministic, terminating search of a sequence of repos looked like a nice idea for that, but I may have to do something different. There was a recent case where a package made a breaking change in its interface in a release (not version) update that broke another package we depend on. It would be nice to be able to temporarily pin that package at its previous version (without updating the source of the third party package that depends on it) to preserve our own build-ability while those packages sort themselves out. There is one case where a pull request for a CRAN-hosted package was verbally accepted but never actioned so we have our own forked version of a CRAN-hosted package which I need to decide what to do with one day soon. Another case where the package version number is different in CRAN from the one we want. We have a dependency on a package that we build from a Git repo but which is also present in CRAN. I don't want to be dependent on the maintainers keeping the package version in the Git copy of the DESCRIPTION file higher than the version in CRAN. Ideally I'd like to build and push to the internal repo and not have to think about it after that. Same issue as before arises, as it stands today I have to either worry about, and probably edit, the version number in the build or manage the cache population process so the internal package instance is added after any CRAN-sourced dependencies and make sure that the public CRAN instances are not accessed in the build. All of these problems are soluble by special-casing the affected installs, specifically managing the cache population (with a requirement that the cache and CRAN not be searched at the same time), or editing version numbers whose next values I do not control, but I would like to try for the simplest approach first. I know I'm not going to get a clean solution here, the relative weights of "known-good" and "latest" are different depending on where you stand. Greg On Sun, 31 Mar 2024 at 22:43, Martin Morgan wrote: available.packages indicates that By default, the return value includes only packages whose version and OS requirements are met by the running version of R, and only gives information on the latest versions of packages. So all repositories are
Re: [R-pkg-devel] Order of repo access from options("repos")
On 1 April 2024 at 17:44, Uwe Ligges wrote: | Untested: | | install.packages() calls available.packages() to find out which packages | are available - and passes a "filters" argument if supplied. | That can be a user defined filter. It should be possible to write a user | defined filter which prefers the packages in your local repo. Intriguing. Presumably that would work for update.packages() too? (We actually have a use case at work, and as one way out I created another side-repo to place a package with an incremented version number so it would 'win' on hightest version; this is due to some non-trivial issues with the underlying dependencies.) Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel