Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Sat, Jun 07, 2014 at 09:46 +1000, Nick Coghlan wrote: On 7 Jun 2014 06:08, Donald Stufft don...@stufft.io wrote: On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote: Once you care for ACLs for indexes and releases you have a number of issues to consider, it's hardly related to PEP470/PEP438. It is related, because it means that the exact same mechanisms can be used, people don’t have to learn two different ways of specifying externally hosted projects. In fact it also teaches them how to specify mirrors and the like as well something that any devpi user is already going to have to learn how to do. This is the key benefit of PEP 470 from my perspective: some aspects of the Python packaging ecosystem suffer from a bad case of too many ways to do it, and if we're ever going to fix that, we need to be ruthless in culling redundant concepts. Specifying custom indexes is a feature with a lot of use cases - local mirrors and private indexes being two of the big ones. By contrast, external references from the simple API duplicate a small subset of the custom index functionality in a way that introduces a whole slew of new concepts that still need to be documented and learned, even if the advice is don't use that, use custom indexes instead. Fair point from a UX design perspective -- trying to minimze the concepts you have to learn. However, IMO many python users feel far from needing to know about configuring indexes with pip. When they try to install a project with an external reference they will none-theless with PEP470 need to know about indices and according options, failure modes etc. They will also usually depend on crawling other index sites every time they perform an install with these options. And i think we all agreed at one point that client-side crawling is not he greatest thing on earth. Linux distros have an update phase collecting infos from the repos, and a separate install phase. So you don't need to go to the remote sites to get index information at install-time. With pip you do it at every install. And, maybe most importantly, for the integrity of their install they will depend on the operators of this external index. DNS-Takeover, MITM or targetted server breakins will not only compromise the server hosting the index but also compromise all users and companies using that index. With a pypi-managed checksummed release link the worst that can happen is that the release file is not there. We can leverage the integrity of PyPI's usually more solid operations to help users not getting something malicious in the future because they decided at one point to rely on an external index now turned evil. As far as dev-pi goes, if it's only mirroring links rather than externally hosted files today, then in the future, it will still automatically mirror the external index URLs. Dependency update scanners could follow those links automatically, even if pip install doesn't check them by default. Yes but it's work to get that right. Simply having checksummed links from pypi makes things a lot simpler. best, need to shop for a barbecue now :) holger One other nice consequence of PEP 470 should make it easier for organisations to flag and investigate cases where they're relying on an upstream source other than PyPI, regardless of whether they care about the details of their dependencies' hosting for speed, reliability or legal reasons. From a migration perspective, how hard would it be to automate generation of a custom index page on pythonhosted.org for projects currently relying on external references? That would still let us make the client changes without needing to special case PIL. Also, it occurred to me that while the latest/any split matters for new users, we still need to consider the impact on projects which have pinned dependencies on older versions of packages that were previously externally hosted, but have moved to PyPI for more recent releases. I still think dropping the external reference feature from the simple API in favour of improving the custom index support is the right to do, but a couple of *client side* examples of handling the migration could help clarify the consequences for the existing users that may be affected. For example, perhaps we should keep --allow-all-external, but have it mean that pip automatically adds new custom index URLs given for the requested packages. Even if it emitted a deprecation warning, clients using it would keep working in the face of the proposed changes to the simple API link handling. Regards, Nick. ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Jun 7, 2014, at 4:06 PM, PJ Eby p...@telecommunity.com wrote: On Fri, Jun 6, 2014 at 10:25 AM, Donald Stufft don...@stufft.io wrote: I expected more people to move to safe external vs staying with the unsafe external. Is there a tool that makes this *easy*? I'm not aware of one. (Ideally, something like a replacement for setup.py upload that generates the download URLs and sends them off to PyPI, so that all one needs is a setup_requires for the tool, a setup.cfg with the hosting prefix, and a run of setup.py register bdist_whatever uplink to get the links set up.) I know of: https://warehouse.python.org/project/bitbucket-distutils/ https://warehouse.python.org/project/github-distutils/ But other than that, no. I assume most people who won’t upload to PyPI are also unlikely to upload to github or bitbucket. - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Fri, Jun 6, 2014 at 10:25 AM, Donald Stufft don...@stufft.io wrote: I expected more people to move to safe external vs staying with the unsafe external. Is there a tool that makes this *easy*? I'm not aware of one. (Ideally, something like a replacement for setup.py upload that generates the download URLs and sends them off to PyPI, so that all one needs is a setup_requires for the tool, a setup.cfg with the hosting prefix, and a run of setup.py register bdist_whatever uplink to get the links set up.) ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
Hi Donald, 1. you published numbers where 4K or 300 discounting PIL would be affected by PEP470. You also say that the main reason for deprecating PEP438 is that it confused users. Did it confuse other users than those few? 2. I don't see a valid precise reasoning why PEP438, just agreed on and implemented last year, needs deprecation. It boosted everyone everyone's install experiences (independently from the CDN which brought another boost) as usage of crawling dramatically dropped and thus brings us into the exact situation PEP438 already hinted at: Deprecation of hosting modes to eventually only allow the pypi-explicit mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires a new process to deal with abandoned packages because of unreachable maintainers for still popular packages. We should follow through and discuss removing crawling and how to deal with abandoned packages. On the PyPI side, what would remain are two kind of links: - pypi internally hosted - registered safe external links to release files The resulting situation is: easy: users have an already existing option to consider to allow externals. safe: All links served from pypi have checksums. Project maintainers need to register hashed links to their new release files. clean: Pip could eventually remove support for crawling/related options. This is all easy to do, reduces user confusion and makes pip and pypi simpler and less suprising. I don't see this approach discussed or seriously considered in the PEP, also not in its rejection reasons. By contrast, PEP470 would require many users to learn about specifying other indexes and what that means. For you and me and many here on the list it may be a no-brainer but trust me, for many users (i've done ten trainings touching the topic now) this is not a natural concept at all. pip install --allow-all-externals is far easier to convey than specifying extra per-project indexes and what it means if the install fails (wrong URL? Index noch reachable? Release file not found?). 3. PEP470 makes life a lot harder for devpi-server, currently used by many companies for serving their private indexes. With PEP438 and almost no external crawling left, devpi-server can rely on seeing changes through the PEP381 API. By contrast, with projects hosted on additional per-project external indexes, it requires polling to see changes because releases may not be registered with PyPI anymore (and there is no way to enforce that IISIC). IOW, PEP470 is a serious regression here as it doesn't allow getting notified on new releasefiles. best, holger On Thu, Jun 05, 2014 at 22:08 -0400, Donald Stufft wrote: Here's round 2 of PEP 470. You can see it online at https://python.org/dev/peps/pep-0470/ or below. Notable changes: - Ensure it's obvious this strictly deals with the installer API and does not affect a project's ability to register their project on PyPI for human consumptions. - Mention that the functional mechanisms that make it possible for an end user to specify the additional locations have existed for a long time across many versions of the installers. - Explicitly mention that the installer changes from PEP 438 should be deprecated and removed as part of this PEP. - Explicitly mention pythonhosted.org as a location that authors can use to host an index if they do not wish to purchase a TLS certificate or host additional infrastructure. - Include that a link to PyPI ToS should be included in the emails sent to authors to remind them of the PyPI ToS. - Special case PIL as it is an outlier in terms of impact. - Fill out the impact sections further to provide more detail Abstract This PEP proposes that the official means of having an installer locate and find package files which are hosted externally to PyPI become the use of multi index support instead of the practice of using external links on the simple installer API. It is important to remember that this is **not** about forcing anyone to host their files on PyPI. If someone does not wish to do so they will never be under any obligation too. They can still list their project in PyPI as an index, and the tooling will still allow them to host it elsewhere. This PEP strictly is concerned with the Simple Installer API and how automated installers interact with PyPI, it has no bearing on the informational pages which are primarily for human consumption. Rationale = There is a long history documented in PEP 438 that explains why externally hosted files exist today in the state that they do on PyPI. For the sake of brevity I will not duplicate that and
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Jun 6, 2014, at 4:13 AM, holger krekel hol...@merlinux.eu wrote: Hi Donald, 1. you published numbers where 4K or 300 discounting PIL would be affected by PEP470. You also say that the main reason for deprecating PEP438 is that it confused users. Did it confuse other users than those few? It confused more of than the current numbers because at the onset more projects relied on it than does now. Currently PIL is the primary instigator for people’s confusion that I personally see. 2. I don't see a valid precise reasoning why PEP438, just agreed on and implemented last year, needs deprecation. It boosted everyone everyone's install experiences (independently from the CDN which brought another boost) as usage of crawling dramatically dropped and thus brings us into the exact situation PEP438 already hinted at: Deprecation of hosting modes to eventually only allow the pypi-explicit mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires a new process to deal with abandoned packages because of unreachable maintainers for still popular packages. We should follow through and discuss removing crawling and how to deal with abandoned packages. On the PyPI side, what would remain are two kind of links: - pypi internally hosted - registered safe external links to release files The resulting situation is: easy: users have an already existing option to consider to allow externals. safe: All links served from pypi have checksums. Project maintainers need to register hashed links to their new release files. clean: Pip could eventually remove support for crawling/related options. This is all easy to do, reduces user confusion and makes pip and pypi simpler and less suprising. I don't see this approach discussed or seriously considered in the PEP, also not in its rejection reasons”. The reasons are listed in the PEP, though I can make it more explicit that it is for this as well. * People are generally surprised that PyPI allows externally linking to files and doesn't require people to host on PyPI. In contrast most of them are familiar with the concept of multiple software repositories such as is in use by many OSs. * PyPI is fronted by a globally distributed CDN which has improved the reliability and speed for end users. It is unlikely that any particular external host has something comparable. This can lead to extremely bad performance for end users when the external host is located in different parts of the world or does not generally have good connectivity. As a data point, many users reported sub DSL speeds and latency when accessing PyPI from parts of Europe and Asia prior to the use of the CDN. * PyPI has monitoring and an on-call rotation of sysadmins whom can respond to downtime quickly, thus enabling a quicker response to downtime. Again it is unlikely that any particular external host will have this. This can lead to single packages in a dependency chain being un-installable. This will often confuse users, who often times have no idea that this package relies on an external host, and they cannot figure out why PyPI appears to be up but the installer cannot find a package. * PyPI supports mirroring, both for private organizations and public mirrors. The legal terms of uploading to PyPI ensure that mirror operators, both public and private, have the right to distribute the software found on PyPI. However software that is hosted externally does not have this, causing private organizations to need to investigate each package individually and manually to determine if the license allows them to mirror it. For public mirrors this essentially means that these externally hosted packages *cannot* be reasonably mirrored. This is particularly troublesome in countries such as China where the bandwidth to outside of China is highly congested making a mirror within China often times a massively better experience. * In the long run, global opt in flags like ``--allow-all-external`` will become little annoyances that developers cargo cult around in order to make their installer work. When they run into a project that requires it they will most likely simply add it to their configuration file for that installer and continue on with whatever they were actually trying to do. This will continue until they try to install their requirements on another computer or attempt to deploy to a server where their install will fail again until they add the make it work flag in their configuration file. Implied but not explicitly called out reason (I’ll add this): * The URL classification only works for a certain subset of projects, however it does not allow for any project which needs additional
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
I’ve updated the PEP: http://hg.python.org/peps/rev/3128e9d38937 files: pep-0470.txt | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/pep-0470.txt b/pep-0470.txt --- a/pep-0470.txt +++ b/pep-0470.txt @@ -389,6 +389,9 @@ hosted. * Default to disallowing safely externally hosted files with only a global flag to enable them, but disallow unsafely hosted. +* Continue on the suggested path of PEP 438 and remove the option to unsafely + host externally but continue to allow the option to safely host externally. + These proposals are rejected because: @@ -454,6 +457,18 @@ or attempt to deploy to a server where their install will fail again until they add the make it work flag in their configuration file. +* The URL classification only works for a certain subset of projects, however + it does not allow for any project which needs additional restrictions such + as Access Controls. This means that there would be two methods of doing the + same thing, linking to a file safely and hosting an index. Hosting an index + works in all situations and by relying on this we make for a more consistent + experience no matter the reason for external hosting. + +* The safe external hosting option hampers the ability of PyPI to upgrade it's + security infrastructure. For instance if MD5 becomes broken in the future + there will be no way for PyPI to upgrade the hashes of the projects which + rely on safe external hosting via MD5 while files that are hosted on PyPI + can simply be processed over with a new hash function. Copyright = - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Fri, Jun 06, 2014 at 07:55 -0400, Donald Stufft wrote: On Jun 6, 2014, at 4:13 AM, holger krekel hol...@merlinux.eu wrote: Hi Donald, 1. you published numbers where 4K or 300 discounting PIL would be affected by PEP470. You also say that the main reason for deprecating PEP438 is that it confused users. Did it confuse other users than those few? It confused more of than the current numbers because at the onset more projects relied on it than does now. Currently PIL is the primary instigator for people’s confusion that I personally see. So currently we don't have many confused users anymore. Doesn't this take away a good part of the reasoning behind PEP470? In the following i use PEP438f to speak about a hypothetical follow-up PEP as outlined in my previous mail. I volunteer to write it and present it as an alternative should we not reach some form of conclusion together. 2. I don't see a valid precise reasoning why PEP438, just agreed on and implemented last year, needs deprecation. It boosted everyone everyone's install experiences (independently from the CDN which brought another boost) as usage of crawling dramatically dropped and thus brings us into the exact situation PEP438 already hinted at: Deprecation of hosting modes to eventually only allow the pypi-explicit mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires a new process to deal with abandoned packages because of unreachable maintainers for still popular packages. We should follow through and discuss removing crawling and how to deal with abandoned packages. On the PyPI side, what would remain are two kind of links: - pypi internally hosted - registered safe external links to release files The resulting situation is: easy: users have an already existing option to consider to allow externals. safe: All links served from pypi have checksums. Project maintainers need to register hashed links to their new release files. clean: Pip could eventually remove support for crawling/related options. This is all easy to do, reduces user confusion and makes pip and pypi simpler and less suprising. I don't see this approach discussed or seriously considered in the PEP, also not in its rejection reasons”. The reasons are listed in the PEP, though I can make it more explicit that it is for this as well. * People are generally surprised that PyPI allows externally linking to files and doesn't require people to host on PyPI. In contrast most of them are familiar with the concept of multiple software repositories such as is in use by many OSs. People are generally surprised is a rather subjective statement. Wrt to PEP470 we might have at least 65 projects and many more users being annoyed rather than just surprised at the sudden change in direction. Especially if there are no compelling arguments. * PyPI is fronted by a globally distributed CDN which has improved the reliability and speed for end users. It is unlikely that any particular external host has something comparable. This can lead to extremely bad performance for end users when the external host is located in different parts of the world or does not generally have good connectivity. As a data point, many users reported sub DSL speeds and latency when accessing PyPI from parts of Europe and Asia prior to the use of the CDN. * PyPI has monitoring and an on-call rotation of sysadmins whom can respond to downtime quickly, thus enabling a quicker response to downtime. Again it is unlikely that any particular external host will have this. This can lead to single packages in a dependency chain being un-installable. This will often confuse users, who often times have no idea that this package relies on an external host, and they cannot figure out why PyPI appears to be up but the installer cannot find a package. Sorry but both points have not much to do with the discussion. If anything, they speak *against* PEP470 because users would need to rely on project specific external index sites to even know which releases exist. With PEP438 you know that a certain release file must exist and the installer clearly says i could not download release file X from URL. Works today. Also the external index could be temporarily broken and serve not the newest files. The integrity and reliability of external indexes would generally not be covered by the CDN and PyPI's on-rotation admins so instead of speaking for PEP470 they speak against it. * PyPI supports mirroring, both for private organizations and public mirrors. The legal terms of uploading to PyPI ensure that mirror operators, both public and private, have the right to
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote: On Fri, Jun 06, 2014 at 07:55 -0400, Donald Stufft wrote: On Jun 6, 2014, at 4:13 AM, holger krekel hol...@merlinux.eu wrote: Hi Donald, 1. you published numbers where 4K or 300 discounting PIL would be affected by PEP470. You also say that the main reason for deprecating PEP438 is that it confused users. Did it confuse other users than those few? It confused more of than the current numbers because at the onset more projects relied on it than does now. Currently PIL is the primary instigator for people’s confusion that I personally see. So currently we don't have many confused users anymore. Doesn't this take away a good part of the reasoning behind PEP470? No. In the following i use PEP438f to speak about a hypothetical follow-up PEP as outlined in my previous mail. I volunteer to write it and present it as an alternative should we not reach some form of conclusion together. 2. I don't see a valid precise reasoning why PEP438, just agreed on and implemented last year, needs deprecation. It boosted everyone everyone's install experiences (independently from the CDN which brought another boost) as usage of crawling dramatically dropped and thus brings us into the exact situation PEP438 already hinted at: Deprecation of hosting modes to eventually only allow the pypi-explicit mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires a new process to deal with abandoned packages because of unreachable maintainers for still popular packages. We should follow through and discuss removing crawling and how to deal with abandoned packages. On the PyPI side, what would remain are two kind of links: - pypi internally hosted - registered safe external links to release files The resulting situation is: easy: users have an already existing option to consider to allow externals. safe: All links served from pypi have checksums. Project maintainers need to register hashed links to their new release files. clean: Pip could eventually remove support for crawling/related options. This is all easy to do, reduces user confusion and makes pip and pypi simpler and less suprising. I don't see this approach discussed or seriously considered in the PEP, also not in its rejection reasons”. The reasons are listed in the PEP, though I can make it more explicit that it is for this as well. * People are generally surprised that PyPI allows externally linking to files and doesn't require people to host on PyPI. In contrast most of them are familiar with the concept of multiple software repositories such as is in use by many OSs. People are generally surprised is a rather subjective statement. Wrt to PEP470 we might have at least 65 projects and many more users being annoyed rather than just surprised at the sudden change in direction. Especially if there are no compelling arguments. * PyPI is fronted by a globally distributed CDN which has improved the reliability and speed for end users. It is unlikely that any particular external host has something comparable. This can lead to extremely bad performance for end users when the external host is located in different parts of the world or does not generally have good connectivity. As a data point, many users reported sub DSL speeds and latency when accessing PyPI from parts of Europe and Asia prior to the use of the CDN. * PyPI has monitoring and an on-call rotation of sysadmins whom can respond to downtime quickly, thus enabling a quicker response to downtime. Again it is unlikely that any particular external host will have this. This can lead to single packages in a dependency chain being un-installable. This will often confuse users, who often times have no idea that this package relies on an external host, and they cannot figure out why PyPI appears to be up but the installer cannot find a package. Sorry but both points have not much to do with the discussion. If anything, they speak *against* PEP470 because users would need to rely on project specific external index sites to even know which releases exist. With PEP438 you know that a certain release file must exist and the installer clearly says i could not download release file X from URL. Works today. Also the external index could be temporarily broken and serve not the newest files. The integrity and reliability of external indexes would generally not be covered by the CDN and PyPI's on-rotation admins so instead of speaking for PEP470 they speak against it. The point is, end users are *aware* they are relying on something external and they are aware exactly what external items they are relying on. With PEP 470 people can correctly
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On 7 Jun 2014 06:08, Donald Stufft don...@stufft.io wrote: On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote: Once you care for ACLs for indexes and releases you have a number of issues to consider, it's hardly related to PEP470/PEP438. It is related, because it means that the exact same mechanisms can be used, people don’t have to learn two different ways of specifying externally hosted projects. In fact it also teaches them how to specify mirrors and the like as well something that any devpi user is already going to have to learn how to do. This is the key benefit of PEP 470 from my perspective: some aspects of the Python packaging ecosystem suffer from a bad case of too many ways to do it, and if we're ever going to fix that, we need to be ruthless in culling redundant concepts. Specifying custom indexes is a feature with a lot of use cases - local mirrors and private indexes being two of the big ones. By contrast, external references from the simple API duplicate a small subset of the custom index functionality in a way that introduces a whole slew of new concepts that still need to be documented and learned, even if the advice is don't use that, use custom indexes instead. As far as dev-pi goes, if it's only mirroring links rather than externally hosted files today, then in the future, it will still automatically mirror the external index URLs. Dependency update scanners could follow those links automatically, even if pip install doesn't check them by default. One other nice consequence of PEP 470 should make it easier for organisations to flag and investigate cases where they're relying on an upstream source other than PyPI, regardless of whether they care about the details of their dependencies' hosting for speed, reliability or legal reasons. From a migration perspective, how hard would it be to automate generation of a custom index page on pythonhosted.org for projects currently relying on external references? That would still let us make the client changes without needing to special case PIL. Also, it occurred to me that while the latest/any split matters for new users, we still need to consider the impact on projects which have pinned dependencies on older versions of packages that were previously externally hosted, but have moved to PyPI for more recent releases. I still think dropping the external reference feature from the simple API in favour of improving the custom index support is the right to do, but a couple of *client side* examples of handling the migration could help clarify the consequences for the existing users that may be affected. For example, perhaps we should keep --allow-all-external, but have it mean that pip automatically adds new custom index URLs given for the requested packages. Even if it emitted a deprecation warning, clients using it would keep working in the face of the proposed changes to the simple API link handling. Regards, Nick. ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
On Jun 6, 2014, at 7:46 PM, Nick Coghlan ncogh...@gmail.com wrote: On 7 Jun 2014 06:08, Donald Stufft don...@stufft.io wrote: On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote: Once you care for ACLs for indexes and releases you have a number of issues to consider, it's hardly related to PEP470/PEP438. It is related, because it means that the exact same mechanisms can be used, people don’t have to learn two different ways of specifying externally hosted projects. In fact it also teaches them how to specify mirrors and the like as well something that any devpi user is already going to have to learn how to do. This is the key benefit of PEP 470 from my perspective: some aspects of the Python packaging ecosystem suffer from a bad case of too many ways to do it, and if we're ever going to fix that, we need to be ruthless in culling redundant concepts. Specifying custom indexes is a feature with a lot of use cases - local mirrors and private indexes being two of the big ones. By contrast, external references from the simple API duplicate a small subset of the custom index functionality in a way that introduces a whole slew of new concepts that still need to be documented and learned, even if the advice is don't use that, use custom indexes instead. As far as dev-pi goes, if it's only mirroring links rather than externally hosted files today, then in the future, it will still automatically mirror the external index URLs. Dependency update scanners could follow those links automatically, even if pip install doesn't check them by default. One other nice consequence of PEP 470 should make it easier for organisations to flag and investigate cases where they're relying on an upstream source other than PyPI, regardless of whether they care about the details of their dependencies' hosting for speed, reliability or legal reasons. From a migration perspective, how hard would it be to automate generation of a custom index page on pythonhosted.org for projects currently relying on external references? That would still let us make the client changes without needing to special case PIL. Not very difficult. My current crawl script could generate a minimal one with some minor modifications (it’d have to save the whole URL instead of just the filename) and would take about 3 hours to process. This process would also weed out links which have died and the like. Downside would be these files wouldn’t be verifies so it would be external + unsafe index since we don’t have hash information to make them safe. Of course this would be the case for PIL anyways which easily makes up most of this traffic so this could just end up in the wash as far as how “safe” it is. Also, it occurred to me that while the latest/any split matters for new users, we still need to consider the impact on projects which have pinned dependencies on older versions of packages that were previously externally hosted, but have moved to PyPI for more recent releases. I still think dropping the external reference feature from the simple API in favour of improving the custom index support is the right to do, but a couple of *client side* examples of handling the migration could help clarify the consequences for the existing users that may be affected. Right, this was one of the reasons my old numbers had a split at 50%, part of the idea was that a project with less than some percent of it’s files hosted on PyPI had a smaller “breakage” surface, even for old pinned versions. I can get these numbers too again if they’d be useful, though I’m not sure if they should go in the PEP or not, it’s already kind of heavy on the numbers I think and I’m not sure additional numbers would be more or less confusing. What do you mean by client side examples of handling the migration? I’m assuming you mean something other than the examples which show how to utilize the new indexes? For example, perhaps we should keep --allow-all-external, but have it mean that pip automatically adds new custom index URLs given for the requested packages. Even if it emitted a deprecation warning, clients using it would keep working in the face of the proposed changes to the simple API link handling. Well it’d actually expand what —allow-all-external means, since it’d also allow those unsafely hosted files. I’m not sure it’d be a good idea to silently (or with a warning even) upgrade an option from a “do this to allow all safely hosted files” to a “do this to allow a whole bunch of legacy and unsafely hosted files”. The one upside to that is we’d direct link to files instead of relying on scraping so you’d have to actually rely on an unsafe file to be at risk, but it still makes me nervous. It’s possible we could add a flag for this, but I’m not sure how useful it’d be since it’d only be in pip 1.6+ and unless people upgrade to that
[Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting
Here's round 2 of PEP 470. You can see it online at https://python.org/dev/peps/pep-0470/ or below. Notable changes: - Ensure it's obvious this strictly deals with the installer API and does not affect a project's ability to register their project on PyPI for human consumptions. - Mention that the functional mechanisms that make it possible for an end user to specify the additional locations have existed for a long time across many versions of the installers. - Explicitly mention that the installer changes from PEP 438 should be deprecated and removed as part of this PEP. - Explicitly mention pythonhosted.org as a location that authors can use to host an index if they do not wish to purchase a TLS certificate or host additional infrastructure. - Include that a link to PyPI ToS should be included in the emails sent to authors to remind them of the PyPI ToS. - Special case PIL as it is an outlier in terms of impact. - Fill out the impact sections further to provide more detail Abstract This PEP proposes that the official means of having an installer locate and find package files which are hosted externally to PyPI become the use of multi index support instead of the practice of using external links on the simple installer API. It is important to remember that this is **not** about forcing anyone to host their files on PyPI. If someone does not wish to do so they will never be under any obligation too. They can still list their project in PyPI as an index, and the tooling will still allow them to host it elsewhere. This PEP strictly is concerned with the Simple Installer API and how automated installers interact with PyPI, it has no bearing on the informational pages which are primarily for human consumption. Rationale = There is a long history documented in PEP 438 that explains why externally hosted files exist today in the state that they do on PyPI. For the sake of brevity I will not duplicate that and instead urge readers to first take a look at PEP 438 for background. There are currently two primary ways for a project to make itself available without directly hosting the package files on PyPI. They can either include links to the package files in the simpler installer API or they can publish a custom package index which contains their project. Custom Additional Index --- Each installer which speaks to PyPI offers a mechanism for the user invoking that installer to provide additional custom locations to search for files during the dependency resolution phase. For pip these locations can be configured per invocation, per shell environment, per requirements file, per virtual environment, and per user. The mechanism for specifying additional locations have existed within pip and setuptools for many years, by comparison the mechanisms in PEP 438 and any other new mechanism will have existed for only a short period of time (if they exist at all currently). The use of additional indexes instead of external links on the simple installer API provides a simple clean interface which is consistent with the way most Linux package systems work (apt-get, yum, etc). More importantly it works the same even for projects which are commercial or otherwise have their access restricted in some form (private networks, password, IP ACLs etc) while the external links method only realistically works for projects which do not have their access restricted. Compared to the complex rules which a project must be aware of to prevent themselves from being considered unsafely hosted setting up an index is fairly trivial and in the simplest case does not require anything more than a filesystem and a standard web server such as Nginx or Twisted Web. Even if using simple static hosting without autoindexing support, it is still straightforward to generate appropriate index pages as static HTML. Example Index with Twisted Web ~~ 1. Create a root directory for your index, for the purposes of the example I'll assume you've chosen ``/var/www/index.example.com/``. 2. Inside of this root directory, create a directory for each project such as ``mkdir -p /var/www/index.example.com/{foo,bar,other}/``. 3. Place the package files for each project in their respective folder, creating paths like ``/var/www/index.example.com/foo/foo-1.0.tar.gz``. 4. Configure Twisted Web to serve the root directory, ideally with TLS. :: $ twistd -n web --path /var/www/index.example.com/ Examples of Additional indexes with pip ~~~ **Invocation:** :: $ pip install --extra-index-url https://pypi.example.com/ foobar **Shell Environment:** :: $ export PIP_EXTRA_INDEX_URL=https://pypi.example.com/ $ pip install foobar **Requirements File:** :: $ echo --extra-index-url https://pypi.example.com/\nfoobar; requirements.txt $ pip install -r requirements.txt **Virtual Environment:** :: $