+1

On Mar 10, 2013, at 1:35 PM, Donald Stufft <don...@stufft.io> wrote:

> 
> On Mar 10, 2013, at 11:07 AM, holger krekel <hol...@merlinux.eu> wrote:
> 
>> Hi Donald, Richard, Nick, Philip, Marc-Andre, all,
>> 
>> after some more thinking i wrote a simplified PEP draft for
>> transitioning hosting of release files to pypi.python.org.  A PEP is
>> warranted IMO because the according changes will affect all python
>> package maintainers and the Python packaging ecology in general.  See
>> the current draft (pre-submit-v1) further below in this mail. 
>> I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt"  at 
>> 
>>   https://bitbucket.org/hpk42/pep-pypi/src
>> 
>> Donald, i'd be happy if you join as a co-author and contribute
>> your statistics script and possibly more implementation stuff (PRs 
>> to pypi software etc.).  
>> 
>> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
>> scrutiny and feedback welcome.
>> 
>> Nick: if you could collect feedback on the PEP (draft) around the 
>> packaging and distribution mini-summit at Pycon US (15th March), that'd 
>> be very useful.  
>> 
>> Richard: I may ask you to become BDFL-delegate for this PEP especially
>> since you will need to integrate any resulting changes :)
>> 
>> I'd like to formally submit this PEP soon but not before i got some 
>> feedback.
>> 
>> I am not subscribed to distutils-sig and i think distutils is not much
>> affected, but it probably still would help if someone cross-posts there
>> (please put me in CC).
>> 
>> cheers,
>> holger
>> 
>> 
>> PEP-draft: transition to release file hosting at pypi.python.org
>> =================================================================
>> 
>> Status
>> -----------
>> 
>> PRE-SUBMIT-v1
>> 
>> Abstract
>> ------------
>> 
>> This PEP proposes to move hosting of all release files to
>> pypi.python.org itself.  To ease transition and minimize client-side
>> friction, **no changes to distutils or installers** are required.
>> Rather, the transition is implemented through changes to the pypi.python.org 
>> implementation and by interactions with package maintainers.
>> 
>> Problem
>> ---------------
>> 
>> Today, python package installers (pip and easy_install) need to
>> query multiple sites to discover release files.  Apart from querying
>> pypi.python.org's simple index pages, also all homepages and
>> download pages ever specified with any release of a package need to
>> be crawled by an installer.  The need for installers to crawl 3rd party
>> sites slows down installation and makes for a brittle unreliable 
>> installation process. 
>> 
>> As of March 2013, about 10% of packages have release files which
>> are not hosted directly from pypi.python.org but rather from places
>> referenced by download/homepage sites.  
>> 
>> Conversely, roughly 90% of packages are hosted directly on
>> pypi.python.org [1]_.  Even for them installers still need to crawl the
>> homepage(s) of a package.  Many package uploaders are particularly not
>> aware that specifying the "homepage" will slow down the installation
>> process.
>> 
>> 
>> Solution
>> -----------
>> 
>> Each package is going to get a "hosting mode" field which effects
>> all historic and future releases of a package and its release files.
>> The field has these values and meanings:                            
>> 
>> - "pypi-ext" (transitional) encodes exactly the current mode of operations:
>> homepage/download urls are presented in simple/ pages and client-side
>> tools need to crawl them themselves to find release file links. 
>> 
>> - "pypi-cache": Release files located on remote sites will be downloaded 
>> and cached by pypi.python.org by crawling homepage/download metadata sites.
>> The resulting simple index contains links to release files hosted by
>> pypi.python.org.  The original homepage/download links are added as
>> links without a ``rel`` attribute if they have the ``#egg`` format.
>> 
>> - "pypi-only": homepage/download links are served on simple indexes
>> but without a ``rel`` attribute.  Installation tools will thus not
>> crawl those pages anymore.  Use this option if you commit to always
>> uploading your release files to pypi.python.org.
>> 
>> 
>> Phases of transition
>> -------------------------
>> 
>> 1. At the outset, we set hosting-mode to "pypi-ext" for all packages.
>>  This will not change any link served via the simple index and thus
>>  no bad effects are expected.  Early adopters and testers may now
>>  change the mode to either pypi-only or pypy-cache to help with
>>  streamlining issues.  After implementation and UI issues are
>>  streamlined, the next phase can start.
>> 
>> 2. We perform automatic analysis for each package to determine if it is
>>  a package with externally hosted release files.  Packages which only 
>>  have release files on pypi.python.org are put in the group "A",
>>  those which have at least some packages outside are put in the group "B".
>> 
>>  We sent then a mail to all maintainers of packages in A 
>>  that their hosting-mode is going to be switched automatically to 
>>  "pypi-only" after N weeks, unless they visit their package
>>  administration page earlier and set it to either pypi-cache or
>>  pypi-only earlier.
>> 
>>  We sent then a mail to all maintainers of packages in B
>>  that their hosting-mode is going to be switched automatically to 
>>  "pypi-cache" after N weeks, unless they visit their package
>>  administration page and set it to either pypi-only or
>>  pypi-cache earlier.
>> 
>> 3. all packages will have a hosting mode of either "pypi-cache"
>>  or "pypi-only", resulting in installers to only query
>>  packages hosted through pypi.python.org.
>> 
>> 
>> Transitioning to "pypi-cache" mode
>> -------------------------------------
>> 
>> When transitioning from the currently implicit "pypi-ext" mode to
>> "pypi-cache" for a given package, a package maintainer should 
>> be able to retrieve/verify the historic release files which will 
>> be cached from pypi.python.org.  The UI should present this list
>> and have the maintainer accept it for completing the transition
>> to the "pypi-cache" mode.  Upon future release registration actions,
>> pypi.python.org will perform crawling for the homepage/download sites
>> and cache release files *before* returning a success return code for
>> the release registration.
>> 
>> 
>> References
>> ------------
>> 
>> .. [1] ratio of externally hosted versus pypi-hosted 
>> http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html
>> 
>> Acknowledgments
>> ----------------------
>> 
>> Donald Stufft for pushing away from external hosting and doing
>> the 90/10 % statistics script and offering to implement a PR.
>> 
>> Philip Eby for precise information and the basic idea to
>> implement the transition via server-side changes only.
>> 
>> Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
>> through issues regarding getting rid of "external hosting".
>> 
>> 
>> Copyright
>> -----------------
>> 
>> This document has been placed in the public domain.
>> 
>> 
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG@python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> Some concerns:
> 
> 1. We cannot automatically switch people to pypi-cache. We _have_ to get 
> explicit permission from them.
> 2. The cache mechanism is going to be fragile, and in the long term leaves a 
> window open for security issues.
> 
> If we're going to do a phased in per project solution like this I think it 
> would work much better to have 2 modes.
> 
> 1. Legacy - Current behavior, new external links are accepted, existing ones 
> are displayed
> 2. PyPI Only - New behavior, no new external links are accepted, existing 
> ones are removed
> 
> Present the project owners with 2 one way buttons:
>   - Switch to PyPI Only and re-host external files [1]
>   - Switch to PyPI Only and do NOT re-host external files
> 
> These buttons would be one time and quit. Once your project has been switched 
> to PyPI Only you cannot go back to Legacy mode. All new projects would be 
> already switched to PyPI Only. After some amount of time switch all Projects 
> to PyPI Only but _do not_ re-host their packages as we cannot legally do so 
> without their permission.
> 
> The above is simpler, still provides people an easy migration path, moves us 
> to remove external hosting, and doesn't entangle us with legal issues.
> 
> [1] There is still a small window here where someone could MITM PyPI fetching 
> these files, however since it would be a one time and down deal this risk is 
> minimal and is worth it to move to an pypi only solution.
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG@python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to