Hi Donald, Richard, Nick, Philip, Marc-Andre, all, after some more thinking i wrote a simplified PEP draft for transitioning hosting of release files to pypi.python.org. A PEP is warranted IMO because the according changes will affect all python package maintainers and the Python packaging ecology in general. See the current draft (pre-submit-v1) further below in this mail. I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt" at
https://bitbucket.org/hpk42/pep-pypi/src Donald, i'd be happy if you join as a co-author and contribute your statistics script and possibly more implementation stuff (PRs to pypi software etc.). Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: scrutiny and feedback welcome. Nick: if you could collect feedback on the PEP (draft) around the packaging and distribution mini-summit at Pycon US (15th March), that'd be very useful. Richard: I may ask you to become BDFL-delegate for this PEP especially since you will need to integrate any resulting changes :) I'd like to formally submit this PEP soon but not before i got some feedback. I am not subscribed to distutils-sig and i think distutils is not much affected, but it probably still would help if someone cross-posts there (please put me in CC). cheers, holger PEP-draft: transition to release file hosting at pypi.python.org ================================================================= Status ----------- PRE-SUBMIT-v1 Abstract ------------ This PEP proposes to move hosting of all release files to pypi.python.org itself. To ease transition and minimize client-side friction, **no changes to distutils or installers** are required. Rather, the transition is implemented through changes to the pypi.python.org implementation and by interactions with package maintainers. Problem --------------- Today, python package installers (pip and easy_install) need to query multiple sites to discover release files. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package need to be crawled by an installer. The need for installers to crawl 3rd party sites slows down installation and makes for a brittle unreliable installation process. As of March 2013, about 10% of packages have release files which are not hosted directly from pypi.python.org but rather from places referenced by download/homepage sites. Conversely, roughly 90% of packages are hosted directly on pypi.python.org [1]_. Even for them installers still need to crawl the homepage(s) of a package. Many package uploaders are particularly not aware that specifying the "homepage" will slow down the installation process. Solution ----------- Each package is going to get a "hosting mode" field which effects all historic and future releases of a package and its release files. The field has these values and meanings: - "pypi-ext" (transitional) encodes exactly the current mode of operations: homepage/download urls are presented in simple/ pages and client-side tools need to crawl them themselves to find release file links. - "pypi-cache": Release files located on remote sites will be downloaded and cached by pypi.python.org by crawling homepage/download metadata sites. The resulting simple index contains links to release files hosted by pypi.python.org. The original homepage/download links are added as links without a ``rel`` attribute if they have the ``#egg`` format. - "pypi-only": homepage/download links are served on simple indexes but without a ``rel`` attribute. Installation tools will thus not crawl those pages anymore. Use this option if you commit to always uploading your release files to pypi.python.org. Phases of transition ------------------------- 1. At the outset, we set hosting-mode to "pypi-ext" for all packages. This will not change any link served via the simple index and thus no bad effects are expected. Early adopters and testers may now change the mode to either pypi-only or pypy-cache to help with streamlining issues. After implementation and UI issues are streamlined, the next phase can start. 2. We perform automatic analysis for each package to determine if it is a package with externally hosted release files. Packages which only have release files on pypi.python.org are put in the group "A", those which have at least some packages outside are put in the group "B". We sent then a mail to all maintainers of packages in A that their hosting-mode is going to be switched automatically to "pypi-only" after N weeks, unless they visit their package administration page earlier and set it to either pypi-cache or pypi-only earlier. We sent then a mail to all maintainers of packages in B that their hosting-mode is going to be switched automatically to "pypi-cache" after N weeks, unless they visit their package administration page and set it to either pypi-only or pypi-cache earlier. 3. all packages will have a hosting mode of either "pypi-cache" or "pypi-only", resulting in installers to only query packages hosted through pypi.python.org. Transitioning to "pypi-cache" mode ------------------------------------- When transitioning from the currently implicit "pypi-ext" mode to "pypi-cache" for a given package, a package maintainer should be able to retrieve/verify the historic release files which will be cached from pypi.python.org. The UI should present this list and have the maintainer accept it for completing the transition to the "pypi-cache" mode. Upon future release registration actions, pypi.python.org will perform crawling for the homepage/download sites and cache release files *before* returning a success return code for the release registration. References ------------ .. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html Acknowledgments ---------------------- Donald Stufft for pushing away from external hosting and doing the 90/10 % statistics script and offering to implement a PR. Philip Eby for precise information and the basic idea to implement the transition via server-side changes only. Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking through issues regarding getting rid of "external hosting". Copyright ----------------- This document has been placed in the public domain. _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig