This informational PEP is intended to be a reference for CI services and CI implementors; and a request for guidelines, tools, and best practices.
Working titles; seeking feedback: - Guide for PyPI CI Service Providers - Request from and advisory for CI Services and CI implementors - PyPI cost solutions: CI, mirrors, containers, and caching to scale - PyPI-dependent CI Service Provider and implementor Guide See "Open Issues": > - Does this need to be a PEP? > - No: It's merely an informational advisory and a request > for consideration of sustainable resource utilization practices. > - Yes: It might as well be maintained as the document to be > sent to CI services which are unnecessarily using significant > amounts of bandwidth. PEP: 9999 Title: PyPI-dependent CI Service Provider and Implementor Guide Author: Wes Turner Sponsor: *[Full Name <email at example.com>]* BDFL-Delegate: Discussions-To: https://groups.google.com/forum/#!forum/pypa-dev Status: Draft Type: [Standards Track | Informational | Process] Content-Type: text/x-rst Requires: *[NNN]* Created: 2020-03-07 Resolution: Abstract ======== Continuous Integration automated build and testing services can help reduce the costs of hosting PyPI by running local mirrors and advising clients in regards to how to efficiently re-build software hundreds or thousands of times a month without re-downloading everything from PyPI every time. This informational PEP is intended to be a reference for CI services and CI implementors; and a request for guidelines, tools, and best practices. Motivation ========== - The costs of maintaining PyPI are increasing exponentially. - CI builds impose significant load upon PyPI. - Frequently re-downloading the exact same packages is wasting PyPI and CI services' time, money, and bandwidth. - Perhaps the primary issue is lack of awareness of solutions for reducing resource requirements and thereby costs for all involved. - Many thousands of projects are overutilizing donated resources when there is a more efficient way that CI services can just centrally solve for. Request from and advisory for CI Services and CI Implementors ============================================================== Dear CI Service, 1. Please consider running local package mirrors and enabling use of local package mirrors by default for clients' CI builds. 2. Please advice clients regarding more efficient containerized software build and test strategies. Running local package mirrors will save PyPI (the Python Package Index, a service maintained by PyPA, a group within the non-profit Python Software Foundation) generously donated resources. (At present (March 2020), PyPI costs ~ $800,000 USD a month to operate; even with generously donated resources). If you would prefer to instead or also donate to PSF, [earmarked] donations are very welcome and will be publicly acknowledged. Data locality through caching is the solution to efficient software distribution. There are a number of opportunities to cache package downloads and thereby (1) reduce bandwidth requirements, and (2) reduce build times: - ~/.cache/pip -- This does not persist across hermetically isolated container invocations - Network-local package repository mirror - Container image There are many package mirroring solutions for Python packages and other packages and containers: - A full mirror - bandersnatch: https://pypi.org/project/bandersnatch/ - A partial mirror: - pulp: https://pulpproject.org/ - Pulp also handles RPM, Debian, Puppet, Docker, and OSTree - A transparent proxy cache mirror - devpi: https://pypi.org/project/devpi/ - Dumb HTTPS cache with maximum filesize: - squid? - IPFS - IPFS for software package repository mirroring is an active area of research. Containers: - OCI Container Registry - Notary (TUF): https://github.com/theupdateframework/notary - Amazon Elastic Container Registry: https://aws.amazon.com/ecr/ - Azure Container Registry: https://azure.microsoft.com/en-us/services/container-registry/ - Docker registry: https://docs.docker.com/registry/deploying/ - DockerHub: https://hub.docker.com/ - GitLab Container Registry: https://docs.gitlab.com/ce/user/packages/container_registry/ - Google Container Registry: https://gcr.io - RedHat Quay Container Registry: https://quay.io - Container Build Services - Any CI Service can be used to build and upload a container There are approaches to making individual (containerized) (Python) software package builds more efficient: A. Build a named container image containing the necessary dependencies, upload the container image to a container registry, reuse the container image for subsequent builds of your package(s) B. Automate updates of pinned dependency versions using a free or paid service that regularly audits dependency specifications stored in source code repositories and sends pull requests to update the pinned versions. C. Create a multi-stage Dockerfile that downloads all of the (version-pinned) dependencies in an initial stage and ``COPY`` those into a later stage which builds and tests the package under test - [ ] TODO: what's the best way to do this? D. Use a docker image as a cache - This requires ``DOCKER_BUILDKIT=1`` to be set so that ``# syntax=docker/dockerfile:experimental`` and ``RUN --mount=type=cache,target=/root/.cache/pip`` work - [ ] TODO: what's the best way to do this? - "build time only -v option" https://github.com/moby/moby/issues/14080 E. Use a container build tool that supports mounting volumes at build time (podman, buildah,) and mount in the ~/.cache/pip directory for all builds so that your build doesn't need to re-download everything for PyPI on every CI build. Security Implications ===================== - Any external dependency is a security risk - When software dependencies are not cached, the devops workflow cannot run when the external dependency is unavailable. - TUF (The Update Framework) may help mitigate cache-poisoning risks. PyPI and CNCF Notary implement cryptographic signatures with TUF: The Update Framework. How to Teach This ================= - [ ] A more detailed guide detailing how to do multi-stage builds that cache dependencies? - [ ] Update packaging.python.org? - [ ] Expand upon the instructions herein Reference Implementation ======================== - [ ] Does anyone have examples of CI services that are doing this well / correctly? E.g. with proxy-caching on by default Rejected Ideas ============== [Why certain ideas that were brought while discussing this PEP were not ultimately pursued.] Open Issues =========== - Request for guidelines, tools, and best practices. - Does this need to be a PEP? - No: It's merely an informational advisory and a request for consideration of sustainable resource utilization practices. - Yes: It might as well be maintained as the document to be sent to CI services which are unnecessarily using significant amounts of bandwidth. References ========== [A collection of URLs used as references through the PEP.] Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. -- You received this message because you are subscribed to the Google Groups "pypa-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to pypa-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/pypa-dev/5536de60-8496-4c51-a2b4-dd88c8d108ab%40googlegroups.com.