+1 On Mar 10, 2013, at 1:35 PM, Donald Stufft <[email protected]> wrote:
> > On Mar 10, 2013, at 11:07 AM, holger krekel <[email protected]> wrote: > >> Hi Donald, Richard, Nick, Philip, Marc-Andre, all, >> >> after some more thinking i wrote a simplified PEP draft for >> transitioning hosting of release files to pypi.python.org. A PEP is >> warranted IMO because the according changes will affect all python >> package maintainers and the Python packaging ecology in general. See >> the current draft (pre-submit-v1) further below in this mail. >> I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt" at >> >> https://bitbucket.org/hpk42/pep-pypi/src >> >> Donald, i'd be happy if you join as a co-author and contribute >> your statistics script and possibly more implementation stuff (PRs >> to pypi software etc.). >> >> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: >> scrutiny and feedback welcome. >> >> Nick: if you could collect feedback on the PEP (draft) around the >> packaging and distribution mini-summit at Pycon US (15th March), that'd >> be very useful. >> >> Richard: I may ask you to become BDFL-delegate for this PEP especially >> since you will need to integrate any resulting changes :) >> >> I'd like to formally submit this PEP soon but not before i got some >> feedback. >> >> I am not subscribed to distutils-sig and i think distutils is not much >> affected, but it probably still would help if someone cross-posts there >> (please put me in CC). >> >> cheers, >> holger >> >> >> PEP-draft: transition to release file hosting at pypi.python.org >> ================================================================= >> >> Status >> ----------- >> >> PRE-SUBMIT-v1 >> >> Abstract >> ------------ >> >> This PEP proposes to move hosting of all release files to >> pypi.python.org itself. To ease transition and minimize client-side >> friction, **no changes to distutils or installers** are required. >> Rather, the transition is implemented through changes to the pypi.python.org >> implementation and by interactions with package maintainers. >> >> Problem >> --------------- >> >> Today, python package installers (pip and easy_install) need to >> query multiple sites to discover release files. Apart from querying >> pypi.python.org's simple index pages, also all homepages and >> download pages ever specified with any release of a package need to >> be crawled by an installer. The need for installers to crawl 3rd party >> sites slows down installation and makes for a brittle unreliable >> installation process. >> >> As of March 2013, about 10% of packages have release files which >> are not hosted directly from pypi.python.org but rather from places >> referenced by download/homepage sites. >> >> Conversely, roughly 90% of packages are hosted directly on >> pypi.python.org [1]_. Even for them installers still need to crawl the >> homepage(s) of a package. Many package uploaders are particularly not >> aware that specifying the "homepage" will slow down the installation >> process. >> >> >> Solution >> ----------- >> >> Each package is going to get a "hosting mode" field which effects >> all historic and future releases of a package and its release files. >> The field has these values and meanings: >> >> - "pypi-ext" (transitional) encodes exactly the current mode of operations: >> homepage/download urls are presented in simple/ pages and client-side >> tools need to crawl them themselves to find release file links. >> >> - "pypi-cache": Release files located on remote sites will be downloaded >> and cached by pypi.python.org by crawling homepage/download metadata sites. >> The resulting simple index contains links to release files hosted by >> pypi.python.org. The original homepage/download links are added as >> links without a ``rel`` attribute if they have the ``#egg`` format. >> >> - "pypi-only": homepage/download links are served on simple indexes >> but without a ``rel`` attribute. Installation tools will thus not >> crawl those pages anymore. Use this option if you commit to always >> uploading your release files to pypi.python.org. >> >> >> Phases of transition >> ------------------------- >> >> 1. At the outset, we set hosting-mode to "pypi-ext" for all packages. >> This will not change any link served via the simple index and thus >> no bad effects are expected. Early adopters and testers may now >> change the mode to either pypi-only or pypy-cache to help with >> streamlining issues. After implementation and UI issues are >> streamlined, the next phase can start. >> >> 2. We perform automatic analysis for each package to determine if it is >> a package with externally hosted release files. Packages which only >> have release files on pypi.python.org are put in the group "A", >> those which have at least some packages outside are put in the group "B". >> >> We sent then a mail to all maintainers of packages in A >> that their hosting-mode is going to be switched automatically to >> "pypi-only" after N weeks, unless they visit their package >> administration page earlier and set it to either pypi-cache or >> pypi-only earlier. >> >> We sent then a mail to all maintainers of packages in B >> that their hosting-mode is going to be switched automatically to >> "pypi-cache" after N weeks, unless they visit their package >> administration page and set it to either pypi-only or >> pypi-cache earlier. >> >> 3. all packages will have a hosting mode of either "pypi-cache" >> or "pypi-only", resulting in installers to only query >> packages hosted through pypi.python.org. >> >> >> Transitioning to "pypi-cache" mode >> ------------------------------------- >> >> When transitioning from the currently implicit "pypi-ext" mode to >> "pypi-cache" for a given package, a package maintainer should >> be able to retrieve/verify the historic release files which will >> be cached from pypi.python.org. The UI should present this list >> and have the maintainer accept it for completing the transition >> to the "pypi-cache" mode. Upon future release registration actions, >> pypi.python.org will perform crawling for the homepage/download sites >> and cache release files *before* returning a success return code for >> the release registration. >> >> >> References >> ------------ >> >> .. [1] ratio of externally hosted versus pypi-hosted >> http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html >> >> Acknowledgments >> ---------------------- >> >> Donald Stufft for pushing away from external hosting and doing >> the 90/10 % statistics script and offering to implement a PR. >> >> Philip Eby for precise information and the basic idea to >> implement the transition via server-side changes only. >> >> Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking >> through issues regarding getting rid of "external hosting". >> >> >> Copyright >> ----------------- >> >> This document has been placed in the public domain. >> >> >> _______________________________________________ >> Catalog-SIG mailing list >> [email protected] >> http://mail.python.org/mailman/listinfo/catalog-sig > > Some concerns: > > 1. We cannot automatically switch people to pypi-cache. We _have_ to get > explicit permission from them. > 2. The cache mechanism is going to be fragile, and in the long term leaves a > window open for security issues. > > If we're going to do a phased in per project solution like this I think it > would work much better to have 2 modes. > > 1. Legacy - Current behavior, new external links are accepted, existing ones > are displayed > 2. PyPI Only - New behavior, no new external links are accepted, existing > ones are removed > > Present the project owners with 2 one way buttons: > - Switch to PyPI Only and re-host external files [1] > - Switch to PyPI Only and do NOT re-host external files > > These buttons would be one time and quit. Once your project has been switched > to PyPI Only you cannot go back to Legacy mode. All new projects would be > already switched to PyPI Only. After some amount of time switch all Projects > to PyPI Only but _do not_ re-host their packages as we cannot legally do so > without their permission. > > The above is simpler, still provides people an easy migration path, moves us > to remove external hosting, and doesn't entangle us with legal issues. > > [1] There is still a small window here where someone could MITM PyPI fetching > these files, however since it would be a one time and down deal this risk is > minimal and is worth it to move to an pypi only solution. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > [email protected] > http://mail.python.org/mailman/listinfo/catalog-sig _______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
