Thank you for your feedback and for sustainably caching build dependencies.
Presumably a caching proxy for all users of a CI service (with Apache, Squid, Nginx, ?) would need to have a current SSL cert, and would be mediating any requests to other servers. Is it also possible to configure clients to use a caching proxy using just environment variables? On Tue, Mar 10, 2020, 2:49 PM Jeremy Stanley <fu...@yuggoth.org> wrote: > [Apologies if you receive multiple copies of this, it seems that > Google Groups may silently discard posts with PGP+MIME signatures.] > > On 2020-03-10 06:50:56 -0700 (-0700), Wes Turner wrote: > [...] > > Reference Implementation > > ======================== > > > > - [ ] Does anyone have examples of CI services that are doing this well > > / correctly? E.g. with proxy-caching on by default > [...] > > The CI system in OpenDev does this. In fact, we tried a number of > the aforementioned approaches over the years: > > 1. Building a limited mirror based on package dependencies; this was > inconvenient as projects needed to wait for the package to be pulled > into the mirror set before it could be used by jobs (because we also > wanted to prevent jobs from accidentally bypassing the mirror and > hitting PyPI directly), and curating the growing list of package > names/versions became cumbersome. > > 2. Maintaining a full mirror of PyPI via bandersnatch; we did this > for years, but it was unstable (especially early on, serial handling > in PyPI's API got better over time) so needed a fair amount of > attention, but the real reason we stopped was that some AI/ML > projects (I'm not pointing fingers but you know who you are) started > dumping giant nightly snapshots of their datasets into PyPI and we > didn't want to have to deal with multi-terabyte filesystem coherency > issues or month-long rebootstrapping periods; bandersnatch > eventually grew an option for filtering specific projects, but > required a full rebuild to filter out all the previously fetched > files, which we didn't want to deal with (and this would have become > an ongoing game of Whack-a-Mole with any new projects following > similar processes). > > 3. Using a caching proxy; this has turned out to be the > lowest-effort solution for us, occasional changes in pip and related > toolchain aside. > > OpenDev's Zuul (project gating CI) service utilizes resources across > roughly a dozen different cloud providers, so we've found the best > way to reduce nondeterministic network failures is to cache as much > as possible locally within every provider/region. We configure Apache > on a persistent virtual machine in each of these via Ansible, and > this is what the relevant configuration currently looks like for > PyPI caching: > > <URL: > https://opendev.org/opendev/system-config/src/commit/b2b0cc1c834856afa5511ca9a489d0dfbc6ba948/playbooks/roles/mirror/templates/mirror.vhost.j2#L36-L88 > > > > Early in the setup phase before jobs might want to start pulling > anything from PyPI we install an /etc/pip.conf file onto the job > nodes from this template, with the local mirror hostname substituted > appropriately: > > <URL: > https://opendev.org/zuul/zuul-jobs/src/commit/de04f76d57ffd5737dea6c6eb3af4c26f2fe08a6/roles/configure-mirrors/templates/etc/pip.conf.j2 > > > > You'll notice that extra-index-url is set to a wheel_mirror URL, > that's a separate cache we build to accelerate jobs which rely on > packages that don't publish wheels to PyPI for the various platforms > we offer (a variety of Linux distributions). We collect common > Python package dependencies for projects running jobs, perform test > installations of them in a separate periodic job, check to see if > they or their transitive dependency set require building a wheel > from sdist rather than downloading a prebuilt one from PyPI, and > then add all of those to a central cache. We do this for each > available Python version across all the distros/releases for which > we maintain node images. The wheels are stored globally in AFS (the > infamous Andrew Filesystem) and then local OpenAFS caches are served > from Apache in every configured cloud provider (the configuration > for it appears immediately below the PyPI proxy cache in the vhost > template linked earlier). > > Of course we don't just cache PyPI, we also mirror and/or cache > Linux distribution package repositories, Dockerhub/Quay, NPM > packages, Git repositories and whatever else is of interest to our > users. Every time a job has to touch the greater Internet to > retrieve build resources, that's one more opportunity for unexpected > failure and further waste of our generously donated build capacity, > so it's in our best interests and those of our users to implement > and take advantage of local caches anywhere we can safely do so > without undue compromise to the integrity of build results. > -- > Jeremy Stanley > > -- > You received this message because you are subscribed to a topic in the > Google Groups "pypa-dev" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/pypa-dev/Pdnoi8UeFZ8/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > pypa-dev+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pypa-dev/20200310184938.a3mktqnp5db7jj3v%40yuggoth.org > . > -- You received this message because you are subscribed to the Google Groups "pypa-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to pypa-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/pypa-dev/CACfEFw_gpcgQSjP%2B%2Ba%2BRMC70uLnUM-HqBkbkiWL_BJviX6%3Do9Q%40mail.gmail.com.