This informational PEP is intended to be a reference
for CI services and CI implementors;
and a request for guidelines, tools, and best practices.

Working titles; seeking feedback:

- Guide for PyPI CI Service Providers
- Request from and advisory for CI Services and CI implementors
- PyPI cost solutions: CI, mirrors, containers, and caching to scale
- PyPI-dependent CI Service Provider and implementor Guide

See "Open Issues":

> - Does this need to be a PEP?
>  - No: It's merely an informational advisory and a request
>    for consideration of sustainable resource utilization practices.
>  - Yes: It might as well be maintained as the document to be
>    sent to CI services which are unnecessarily using significant
>    amounts of bandwidth.


PEP: 9999
Title: PyPI-dependent CI Service Provider and Implementor Guide
Author: Wes Turner
Sponsor: *[Full Name <email at example.com>]*
BDFL-Delegate:
Discussions-To: https://groups.google.com/forum/#!forum/pypa-dev
Status: Draft
Type: [Standards Track | Informational | Process]
Content-Type: text/x-rst
Requires: *[NNN]*
Created: 2020-03-07
Resolution:


Abstract
========

Continuous Integration automated build and testing services
can help reduce the costs of hosting PyPI by running local mirrors
and advising clients in regards to how to efficiently re-build
software hundreds or thousands of times a month
without re-downloading everything from PyPI every time.

This informational PEP is intended to be a reference
for CI services and CI implementors;
and a request for guidelines, tools, and best practices.

Motivation
==========

- The costs of maintaining PyPI are increasing exponentially.
- CI builds impose significant load upon PyPI.
- Frequently re-downloading the exact same packages
  is wasting PyPI and CI services' time, money, and bandwidth.
- Perhaps the primary issue is lack of awareness
  of solutions for reducing resource requirements
  and thereby costs for all involved.
- Many thousands of projects are overutilizing donated resources
  when there is a more efficient way that CI services
  can just centrally solve for.


Request from and advisory for CI Services and CI Implementors
==============================================================
Dear CI Service,

1. Please consider running local package mirrors and enabling use of local
   package mirrors by default for clients' CI builds.
2. Please advice clients regarding more efficient containerized
   software build and test strategies.

Running local package mirrors will save PyPI (the Python Package Index,
a service maintained by PyPA, a group within the non-profit Python
Software Foundation) generously donated resources.
(At present (March 2020), PyPI costs ~ $800,000 USD a month to operate; 
even with
generously donated resources).

If you would prefer to instead or also donate to PSF, [earmarked]
donations are very welcome and will be publicly acknowledged.

Data locality through caching is the solution
to efficient software distribution. There are a number of opportunities
to cache package downloads and thereby (1) reduce bandwidth
requirements, and (2) reduce build times:

- ~/.cache/pip -- This does not persist across hermetically isolated 
container invocations
- Network-local package repository mirror
- Container image

There are many package mirroring solutions for Python packages
and other packages and containers:

- A full mirror
  - bandersnatch: https://pypi.org/project/bandersnatch/
- A partial mirror:
  - pulp: https://pulpproject.org/
    - Pulp also handles RPM, Debian, Puppet, Docker, and OSTree
- A transparent proxy cache mirror
  - devpi: https://pypi.org/project/devpi/
  - Dumb HTTPS cache with maximum filesize:
    - squid?
- IPFS
  - IPFS for software package repository mirroring is an active area of
    research.

Containers:

- OCI Container Registry
  - Notary (TUF): https://github.com/theupdateframework/notary
  - Amazon Elastic Container Registry: https://aws.amazon.com/ecr/
  - Azure Container Registry: 
https://azure.microsoft.com/en-us/services/container-registry/
  - Docker registry: https://docs.docker.com/registry/deploying/
  - DockerHub:  https://hub.docker.com/
  - GitLab Container Registry:
    https://docs.gitlab.com/ce/user/packages/container_registry/
  - Google Container Registry: https://gcr.io
  - RedHat Quay Container Registry: https://quay.io
- Container Build Services
  - Any CI Service can be used to build and upload a container

There are approaches to making individual (containerized) (Python)
software package builds more efficient:

A. Build a named container image containing the necessary dependencies,
   upload the container image to a container registry,
   reuse the container image for subsequent builds of your
   package(s)
B. Automate updates of pinned dependency versions using a
   free or paid service that regularly audits dependency specifications
   stored in source code repositories and sends pull requests
   to update the pinned versions.
C. Create a multi-stage Dockerfile that downloads all of the
   (version-pinned) dependencies
   in an initial stage and ``COPY`` those into a later stage which builds
   and tests the package under test

  - [ ] TODO: what's the best way to do this?

D. Use a docker image as a cache

   - This requires ``DOCKER_BUILDKIT=1`` to be set
     so that ``# syntax=docker/dockerfile:experimental``
     and ``RUN --mount=type=cache,target=/root/.cache/pip`` work
   - [ ] TODO: what's the best way to do this?
   - "build time only -v option"
     https://github.com/moby/moby/issues/14080

E. Use a container build tool that supports mounting volumes at build
   time (podman, buildah,) and mount in the ~/.cache/pip directory
   for all builds so that your build doesn't need to re-download
   everything for PyPI on every CI build.



Security Implications
=====================

- Any external dependency is a security risk
- When software dependencies are not cached,
  the devops workflow cannot run when the external dependency is
  unavailable.
- TUF (The Update Framework) may help mitigate cache-poisoning risks.
  PyPI and CNCF Notary implement cryptographic signatures with TUF:
  The Update Framework.


How to Teach This
=================

- [ ] A more detailed guide detailing how to do multi-stage builds that
  cache dependencies?
- [ ] Update packaging.python.org?
- [ ] Expand upon the instructions herein


Reference Implementation
========================

- [ ] Does anyone have examples of CI services that are doing this well
  / correctly? E.g. with proxy-caching on by default


Rejected Ideas
==============

[Why certain ideas that were brought while discussing this PEP were not 
ultimately pursued.]


Open Issues
===========

- Request for guidelines, tools, and best practices.
- Does this need to be a PEP?
  - No: It's merely an informational advisory and a request
    for consideration of sustainable resource utilization practices.
  - Yes: It might as well be maintained as the document to be
    sent to CI services which are unnecessarily using significant
    amounts of bandwidth.


References
==========

[A collection of URLs used as references through the PEP.]


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

-- 
You received this message because you are subscribed to the Google Groups 
"pypa-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pypa-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/pypa-dev/5536de60-8496-4c51-a2b4-dd88c8d108ab%40googlegroups.com.

Reply via email to