Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-07 Thread holger krekel
On Sat, Jun 07, 2014 at 09:46 +1000, Nick Coghlan wrote:
 On 7 Jun 2014 06:08, Donald Stufft don...@stufft.io wrote:
 
 
  On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote:
  
   Once you care for ACLs for indexes and releases you have a number
   of issues to consider, it's hardly related to PEP470/PEP438.
 
  It is related, because it means that the exact same mechanisms can be
 used,
  people don’t have to learn two different ways of specifying externally
 hosted
  projects. In fact it also teaches them how to specify mirrors and the
 like as well
  something that any devpi user is already going to have to learn how to do.
 
 This is the key benefit of PEP 470 from my perspective: some aspects of the
 Python packaging ecosystem suffer from a bad case of too many ways to do
 it, and if we're ever going to fix that, we need to be ruthless in culling
 redundant concepts.

 Specifying custom indexes is a feature with a lot of use cases - local
 mirrors and private indexes being two of the big ones. By contrast,
 external references from the simple API duplicate a small subset of the
 custom index functionality in a way that introduces a whole slew of new
 concepts that still need to be documented and learned, even if the advice
 is don't use that, use custom indexes instead.

Fair point from a UX design perspective -- trying to minimze the concepts
you have to learn.  However, IMO many python users feel far from needing to
know about configuring indexes with pip.  When they try to install a
project with an external reference they will none-theless with PEP470 need 
to know about indices and according options, failure modes etc.  They
will also usually depend on crawling other index sites every time they
perform an install with these options.  

And i think we all agreed at one point that client-side crawling is not
he greatest thing on earth.  Linux distros have an update phase
collecting infos from the repos, and a separate install phase.  So you
don't need to go to the remote sites to get index information at
install-time.  With pip you do it at every install.

And, maybe most importantly, for the integrity of their install they
will depend on the operators of this external index.  DNS-Takeover, MITM
or targetted server breakins will not only compromise the server hosting
the index but also compromise all users and companies using that index.
With a pypi-managed checksummed release link the worst that can happen
is that the release file is not there.  We can leverage the integrity of 
PyPI's usually more solid operations to help users not getting something
malicious in the future because they decided at one point to rely on an
external index now turned evil.

 As far as dev-pi goes, if it's only mirroring links rather than externally
 hosted files today, then in the future, it will still automatically mirror
 the external index URLs. Dependency update scanners could follow those
 links automatically, even if pip install doesn't check them by default.

Yes but it's work to get that right.  Simply having checksummed links
from pypi makes things a lot simpler.

best, need to shop for a barbecue now :)
holger

 One other nice consequence of PEP 470 should make it easier for
 organisations to flag and investigate cases where they're relying on an
 upstream source other than PyPI, regardless of whether they care about the
 details of their dependencies' hosting for speed, reliability or legal
 reasons.

 From a migration perspective, how hard would it be to automate generation
 of a custom index page on pythonhosted.org for projects currently relying
 on external references? That would still let us make the client changes
 without needing to special case PIL.
 
 Also, it occurred to me that while the latest/any split matters for new
 users, we still need to consider the impact on projects which have pinned
 dependencies on older versions of packages that were previously externally
 hosted, but have moved to PyPI for more recent releases. I still think
 dropping the external reference feature from the simple API in favour of
 improving the custom index support is the right to do, but a couple of
 *client side* examples of handling the migration could help clarify the
 consequences for the existing users that may be affected.
 
 For example, perhaps we should keep --allow-all-external, but have it
 mean that pip automatically adds new custom index URLs given for the
 requested packages. Even if it emitted a deprecation warning, clients using
 it would keep working in the face of the proposed changes to the simple API
 link handling.

 Regards,
 Nick.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-07 Thread Donald Stufft

On Jun 7, 2014, at 4:06 PM, PJ Eby p...@telecommunity.com wrote:

 
 On Fri, Jun 6, 2014 at 10:25 AM, Donald Stufft don...@stufft.io wrote:
 I expected more people to move to safe external vs staying with the unsafe
 external.
 
 Is there a tool that makes this *easy*?  I'm not aware of one.
 
 (Ideally, something like a replacement for setup.py upload that generates the 
 download URLs and sends them off to PyPI, so that all one needs is a 
 setup_requires for the tool, a setup.cfg with the hosting prefix, and a run 
 of setup.py register bdist_whatever uplink to get the links set up.)
 


I know of:

https://warehouse.python.org/project/bitbucket-distutils/
https://warehouse.python.org/project/github-distutils/

But other than that, no. I assume most people who won’t upload to PyPI
are also unlikely to upload to github or bitbucket.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-07 Thread PJ Eby
On Fri, Jun 6, 2014 at 10:25 AM, Donald Stufft don...@stufft.io wrote:

 I expected more people to move to safe external vs staying with the unsafe
 external.


Is there a tool that makes this *easy*?  I'm not aware of one.

(Ideally, something like a replacement for setup.py upload that generates
the download URLs and sends them off to PyPI, so that all one needs is a
setup_requires for the tool, a setup.cfg with the hosting prefix, and a run
of setup.py register bdist_whatever uplink to get the links set up.)
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread holger krekel
Hi Donald,

1. you published numbers where 4K or 300 discounting PIL would be
   affected by PEP470.  You also say that the main reason for deprecating
   PEP438 is that it confused users.  Did it confuse other users than those few?

2. I don't see a valid precise reasoning why PEP438, just agreed on and 
   implemented last year, needs deprecation.  It boosted everyone
   everyone's install experiences (independently from the CDN which
   brought another boost) as usage of crawling dramatically dropped 
   and thus brings us into the exact situation PEP438 already hinted at:

   Deprecation of hosting modes to eventually only allow the
   pypi-explicit mode is NOT REGULATED by this PEP but is expected to
   become feasible some time after successful implementation of the
   transition phases described in this PEP. It is expected that
   deprecation requires a new process to deal with abandoned packages
   because of unreachable maintainers for still popular packages.

   We should follow through and discuss removing crawling and 
   how to deal with abandoned packages.  On the PyPI side, what 
   would remain are two kind of links:

   - pypi internally hosted
   - registered safe external links to release files

   The resulting situation is:

   easy: users have an already existing option to consider to allow externals.
   
   safe: All links served from pypi have checksums. Project maintainers need
 to register hashed links to their new release files.
 
   clean: Pip could eventually remove support for crawling/related options.
   
   This is all easy to do, reduces user confusion and makes pip
   and pypi simpler and less suprising.

   I don't see this approach discussed or seriously considered in the PEP,
   also not in its rejection reasons.

   By contrast, PEP470 would require many users to learn about
   specifying other indexes and what that means.  For you and me
   and many here on the list it may be a no-brainer but trust me,
   for many users (i've done ten trainings touching the topic now)
   this is not a natural concept at all.  pip install --allow-all-externals 
   is far easier to convey than specifying extra per-project indexes and 
   what it means if the install fails (wrong URL? Index noch reachable?
   Release file not found?).

3. PEP470 makes life a lot harder for devpi-server, currently used
   by many companies for serving their private indexes.  With PEP438 and
   almost no external crawling left, devpi-server can rely on seeing
   changes through the PEP381 API.  By contrast, with projects hosted on
   additional per-project external indexes, it requires polling to see
   changes because releases may not be registered with PyPI anymore (and
   there is no way to enforce that IISIC).  IOW, PEP470 is a serious
   regression here as it doesn't allow getting notified on new releasefiles.

best,
holger

On Thu, Jun 05, 2014 at 22:08 -0400, Donald Stufft wrote:
 Here's round 2 of PEP 470.
 
 You can see it online at https://python.org/dev/peps/pep-0470/ or below.
 
 Notable changes:
 
 - Ensure it's obvious this strictly deals with the installer API and does not
   affect a project's ability to register their project on PyPI for human
   consumptions.
 
 - Mention that the functional mechanisms that make it possible for an end user
   to specify the additional locations have existed for a long time across many
   versions of the installers.
 
 - Explicitly mention that the installer changes from PEP 438 should be
   deprecated and removed as part of this PEP.
 
 - Explicitly mention pythonhosted.org as a location that authors can use to
   host an index if they do not wish to purchase a TLS certificate or host
   additional infrastructure.
 
 - Include that a link to PyPI ToS should be included in the emails sent to
   authors to remind them of the PyPI ToS.
 
 - Special case PIL as it is an outlier in terms of impact.
 
 - Fill out the impact sections further to provide more detail
 
 
 Abstract
 
 
 This PEP proposes that the official means of having an installer locate and
 find package files which are hosted externally to PyPI become the use of
 multi index support instead of the practice of using external links on the
 simple installer API.
 
 It is important to remember that this is **not** about forcing anyone to host
 their files on PyPI. If someone does not wish to do so they will never be 
 under
 any obligation too. They can still list their project in PyPI as an index, and
 the tooling will still allow them to host it elsewhere.
 
 This PEP strictly is concerned with the Simple Installer API and how automated
 installers interact with PyPI, it has no bearing on the informational pages
 which are primarily for human consumption.
 
 
 Rationale
 =
 
 There is a long history documented in PEP 438 that explains why externally
 hosted files exist today in the state that they do on PyPI. For the sake of
 brevity I will not duplicate that and 

Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread Donald Stufft

On Jun 6, 2014, at 4:13 AM, holger krekel hol...@merlinux.eu wrote:

 Hi Donald,
 
 1. you published numbers where 4K or 300 discounting PIL would be
   affected by PEP470.  You also say that the main reason for deprecating
   PEP438 is that it confused users.  Did it confuse other users than those 
 few?

It confused more of than the current numbers because at the onset more projects
relied on it than does now. Currently PIL is the primary instigator for people’s
confusion that I personally see.

 
 2. I don't see a valid precise reasoning why PEP438, just agreed on and 
   implemented last year, needs deprecation.  It boosted everyone
   everyone's install experiences (independently from the CDN which
   brought another boost) as usage of crawling dramatically dropped 
   and thus brings us into the exact situation PEP438 already hinted at:
 
   Deprecation of hosting modes to eventually only allow the
   pypi-explicit mode is NOT REGULATED by this PEP but is expected to
   become feasible some time after successful implementation of the
   transition phases described in this PEP. It is expected that
   deprecation requires a new process to deal with abandoned packages
   because of unreachable maintainers for still popular packages.
 
   We should follow through and discuss removing crawling and 
   how to deal with abandoned packages.  On the PyPI side, what 
   would remain are two kind of links:
 
   - pypi internally hosted
   - registered safe external links to release files
 
   The resulting situation is:
 
   easy: users have an already existing option to consider to allow externals.
 
   safe: All links served from pypi have checksums. Project maintainers need
 to register hashed links to their new release files.
 
   clean: Pip could eventually remove support for crawling/related options.
 
   This is all easy to do, reduces user confusion and makes pip
   and pypi simpler and less suprising.
 
   I don't see this approach discussed or seriously considered in the PEP,
   also not in its rejection reasons”.

The reasons are listed in the PEP, though I can make it more explicit that
it is for this as well.

* People are generally surprised that PyPI allows externally linking to files
  and doesn't require people to host on PyPI. In contrast most of them are
  familiar with the concept of multiple software repositories such as is in
  use by many OSs.

* PyPI is fronted by a globally distributed CDN which has improved the
  reliability and speed for end users. It is unlikely that any particular
  external host has something comparable. This can lead to extremely bad
  performance for end users when the external host is located in different
  parts of the world or does not generally have good connectivity.

  As a data point, many users reported sub DSL speeds and latency when
  accessing PyPI from parts of Europe and Asia prior to the use of the CDN.

* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
  downtime quickly, thus enabling a quicker response to downtime. Again it is
  unlikely that any particular external host will have this. This can lead
  to single packages in a dependency chain being un-installable. This will
  often confuse users, who often times have no idea that this package relies
  on an external host, and they cannot figure out why PyPI appears to be up
  but the installer cannot find a package.

* PyPI supports mirroring, both for private organizations and public mirrors.
  The legal terms of uploading to PyPI ensure that mirror operators, both
  public and private, have the right to distribute the software found on PyPI.
  However software that is hosted externally does not have this, causing
  private organizations to need to investigate each package individually and
  manually to determine if the license allows them to mirror it.

  For public mirrors this essentially means that these externally hosted
  packages *cannot* be reasonably mirrored. This is particularly troublesome
  in countries such as China where the bandwidth to outside of China is
  highly congested making a mirror within China often times a massively better
  experience.

* In the long run, global opt in flags like ``--allow-all-external`` will
  become little annoyances that developers cargo cult around in order to make
  their installer work. When they run into a project that requires it they
  will most likely simply add it to their configuration file for that installer
  and continue on with whatever they were actually trying to do. This will
  continue until they try to install their requirements on another computer
  or attempt to deploy to a server where their install will fail again until
  they add the make it work flag in their configuration file.

Implied but not explicitly called out reason (I’ll add this):

* The URL classification only works for a certain subset of projects, however
  it does not allow for any project which needs additional 

Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread Donald Stufft
I’ve updated the PEP:

http://hg.python.org/peps/rev/3128e9d38937


files:
 pep-0470.txt |  15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)


diff --git a/pep-0470.txt b/pep-0470.txt
--- a/pep-0470.txt
+++ b/pep-0470.txt
@@ -389,6 +389,9 @@
  hosted.
* Default to disallowing safely externally hosted files with only a global
  flag to enable them, but disallow unsafely hosted.
+* Continue on the suggested path of PEP 438 and remove the option to unsafely
+  host externally but continue to allow the option to safely host externally.
+

These proposals are rejected because:

@@ -454,6 +457,18 @@
  or attempt to deploy to a server where their install will fail again until
  they add the make it work flag in their configuration file.

+* The URL classification only works for a certain subset of projects, however
+  it does not allow for any project which needs additional restrictions such
+  as Access Controls. This means that there would be two methods of doing the
+  same thing, linking to a file safely and hosting an index. Hosting an index
+  works in all situations and by relying on this we make for a more consistent
+  experience no matter the reason for external hosting.
+
+* The safe external hosting option hampers the ability of PyPI to upgrade it's
+  security infrastructure. For instance if MD5 becomes broken in the future
+  there will be no way for PyPI to upgrade the hashes of the projects which
+  rely on safe external hosting via MD5 while files that are hosted on PyPI
+  can simply be processed over with a new hash function.

Copyright
=

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread holger krekel
On Fri, Jun 06, 2014 at 07:55 -0400, Donald Stufft wrote:
 
 On Jun 6, 2014, at 4:13 AM, holger krekel hol...@merlinux.eu wrote:
 
  Hi Donald,
  
  1. you published numbers where 4K or 300 discounting PIL would be
affected by PEP470.  You also say that the main reason for deprecating
PEP438 is that it confused users.  Did it confuse other users than those 
  few?
 
 It confused more of than the current numbers because at the onset more
 projects relied on it than does now. Currently PIL is the primary
 instigator for people’s confusion that I personally see.

So currently we don't have many confused users anymore.  Doesn't
this take away a good part of the reasoning behind PEP470?

In the following i use PEP438f to speak about a hypothetical
follow-up PEP as outlined in my previous mail.  I volunteer to write
it and present it as an alternative should we not reach some 
form of conclusion together.

  2. I don't see a valid precise reasoning why PEP438, just agreed on and 
implemented last year, needs deprecation.  It boosted everyone
everyone's install experiences (independently from the CDN which
brought another boost) as usage of crawling dramatically dropped 
and thus brings us into the exact situation PEP438 already hinted at:
  
Deprecation of hosting modes to eventually only allow the
pypi-explicit mode is NOT REGULATED by this PEP but is expected to
become feasible some time after successful implementation of the
transition phases described in this PEP. It is expected that
deprecation requires a new process to deal with abandoned packages
because of unreachable maintainers for still popular packages.
  
We should follow through and discuss removing crawling and 
how to deal with abandoned packages.  On the PyPI side, what 
would remain are two kind of links:
  
- pypi internally hosted
- registered safe external links to release files
  
The resulting situation is:
  
easy: users have an already existing option to consider to allow 
  externals.
  
safe: All links served from pypi have checksums. Project maintainers need
  to register hashed links to their new release files.
  
clean: Pip could eventually remove support for crawling/related options.
  
This is all easy to do, reduces user confusion and makes pip
and pypi simpler and less suprising.
  
I don't see this approach discussed or seriously considered in the PEP,
also not in its rejection reasons”.
 
 The reasons are listed in the PEP, though I can make it more explicit that
 it is for this as well.
 
 * People are generally surprised that PyPI allows externally linking to files
   and doesn't require people to host on PyPI. In contrast most of them are
   familiar with the concept of multiple software repositories such as is in
   use by many OSs.

People are generally surprised is a rather subjective statement.
Wrt to PEP470 we might have at least 65 projects and many more users being 
annoyed rather than just surprised at the sudden change in direction.
Especially if there are no compelling arguments.

 * PyPI is fronted by a globally distributed CDN which has improved the
   reliability and speed for end users. It is unlikely that any particular
   external host has something comparable. This can lead to extremely bad
   performance for end users when the external host is located in different
   parts of the world or does not generally have good connectivity.
 
   As a data point, many users reported sub DSL speeds and latency when
   accessing PyPI from parts of Europe and Asia prior to the use of the CDN.

 * PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
   downtime quickly, thus enabling a quicker response to downtime. Again it is
   unlikely that any particular external host will have this. This can lead
   to single packages in a dependency chain being un-installable. This will
   often confuse users, who often times have no idea that this package relies
   on an external host, and they cannot figure out why PyPI appears to be up
   but the installer cannot find a package.

Sorry but both points have not much to do with the discussion.  If
anything, they speak *against* PEP470 because users would need to rely
on project specific external index sites to even know which releases
exist.  With PEP438 you know that a certain release file must exist and
the installer clearly says i could not download release file X from
URL.  Works today.

Also the external index could be temporarily broken and serve not the newest
files.  The integrity and reliability of external indexes would generally
not be covered by the CDN and PyPI's on-rotation admins so instead of
speaking for PEP470 they speak against it.

 * PyPI supports mirroring, both for private organizations and public mirrors.
   The legal terms of uploading to PyPI ensure that mirror operators, both
   public and private, have the right to 

Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread Donald Stufft

On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote:

 On Fri, Jun 06, 2014 at 07:55 -0400, Donald Stufft wrote:
 
 On Jun 6, 2014, at 4:13 AM, holger krekel hol...@merlinux.eu wrote:
 
 Hi Donald,
 
 1. you published numbers where 4K or 300 discounting PIL would be
  affected by PEP470.  You also say that the main reason for deprecating
  PEP438 is that it confused users.  Did it confuse other users than those 
 few?
 
 It confused more of than the current numbers because at the onset more
 projects relied on it than does now. Currently PIL is the primary
 instigator for people’s confusion that I personally see.
 
 So currently we don't have many confused users anymore.  Doesn't
 this take away a good part of the reasoning behind PEP470?

No.

 
 In the following i use PEP438f to speak about a hypothetical
 follow-up PEP as outlined in my previous mail.  I volunteer to write
 it and present it as an alternative should we not reach some 
 form of conclusion together.
 
 2. I don't see a valid precise reasoning why PEP438, just agreed on and 
  implemented last year, needs deprecation.  It boosted everyone
  everyone's install experiences (independently from the CDN which
  brought another boost) as usage of crawling dramatically dropped 
  and thus brings us into the exact situation PEP438 already hinted at:
 
  Deprecation of hosting modes to eventually only allow the
  pypi-explicit mode is NOT REGULATED by this PEP but is expected to
  become feasible some time after successful implementation of the
  transition phases described in this PEP. It is expected that
  deprecation requires a new process to deal with abandoned packages
  because of unreachable maintainers for still popular packages.
 
  We should follow through and discuss removing crawling and 
  how to deal with abandoned packages.  On the PyPI side, what 
  would remain are two kind of links:
 
  - pypi internally hosted
  - registered safe external links to release files
 
  The resulting situation is:
 
  easy: users have an already existing option to consider to allow externals.
 
  safe: All links served from pypi have checksums. Project maintainers need
to register hashed links to their new release files.
 
  clean: Pip could eventually remove support for crawling/related options.
 
  This is all easy to do, reduces user confusion and makes pip
  and pypi simpler and less suprising.
 
  I don't see this approach discussed or seriously considered in the PEP,
  also not in its rejection reasons”.
 
 The reasons are listed in the PEP, though I can make it more explicit that
 it is for this as well.
 
 * People are generally surprised that PyPI allows externally linking to files
  and doesn't require people to host on PyPI. In contrast most of them are
  familiar with the concept of multiple software repositories such as is in
  use by many OSs.
 
 People are generally surprised is a rather subjective statement.
 Wrt to PEP470 we might have at least 65 projects and many more users being 
 annoyed rather than just surprised at the sudden change in direction.
 Especially if there are no compelling arguments.
 
 * PyPI is fronted by a globally distributed CDN which has improved the
  reliability and speed for end users. It is unlikely that any particular
  external host has something comparable. This can lead to extremely bad
  performance for end users when the external host is located in different
  parts of the world or does not generally have good connectivity.
 
  As a data point, many users reported sub DSL speeds and latency when
  accessing PyPI from parts of Europe and Asia prior to the use of the CDN.
 
 * PyPI has monitoring and an on-call rotation of sysadmins whom can respond 
 to
  downtime quickly, thus enabling a quicker response to downtime. Again it is
  unlikely that any particular external host will have this. This can lead
  to single packages in a dependency chain being un-installable. This will
  often confuse users, who often times have no idea that this package relies
  on an external host, and they cannot figure out why PyPI appears to be up
  but the installer cannot find a package.
 
 Sorry but both points have not much to do with the discussion.  If
 anything, they speak *against* PEP470 because users would need to rely
 on project specific external index sites to even know which releases
 exist.  With PEP438 you know that a certain release file must exist and
 the installer clearly says i could not download release file X from
 URL.  Works today.
 
 Also the external index could be temporarily broken and serve not the newest
 files.  The integrity and reliability of external indexes would generally
 not be covered by the CDN and PyPI's on-rotation admins so instead of
 speaking for PEP470 they speak against it.

The point is, end users are *aware* they are relying on something external
and they are aware exactly what external items they are relying on. With PEP 470
people can correctly 

Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread Nick Coghlan
On 7 Jun 2014 06:08, Donald Stufft don...@stufft.io wrote:


 On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote:
 
  Once you care for ACLs for indexes and releases you have a number
  of issues to consider, it's hardly related to PEP470/PEP438.

 It is related, because it means that the exact same mechanisms can be
used,
 people don’t have to learn two different ways of specifying externally
hosted
 projects. In fact it also teaches them how to specify mirrors and the
like as well
 something that any devpi user is already going to have to learn how to do.

This is the key benefit of PEP 470 from my perspective: some aspects of the
Python packaging ecosystem suffer from a bad case of too many ways to do
it, and if we're ever going to fix that, we need to be ruthless in culling
redundant concepts.

Specifying custom indexes is a feature with a lot of use cases - local
mirrors and private indexes being two of the big ones. By contrast,
external references from the simple API duplicate a small subset of the
custom index functionality in a way that introduces a whole slew of new
concepts that still need to be documented and learned, even if the advice
is don't use that, use custom indexes instead.

As far as dev-pi goes, if it's only mirroring links rather than externally
hosted files today, then in the future, it will still automatically mirror
the external index URLs. Dependency update scanners could follow those
links automatically, even if pip install doesn't check them by default.

One other nice consequence of PEP 470 should make it easier for
organisations to flag and investigate cases where they're relying on an
upstream source other than PyPI, regardless of whether they care about the
details of their dependencies' hosting for speed, reliability or legal
reasons.

From a migration perspective, how hard would it be to automate generation
of a custom index page on pythonhosted.org for projects currently relying
on external references? That would still let us make the client changes
without needing to special case PIL.

Also, it occurred to me that while the latest/any split matters for new
users, we still need to consider the impact on projects which have pinned
dependencies on older versions of packages that were previously externally
hosted, but have moved to PyPI for more recent releases. I still think
dropping the external reference feature from the simple API in favour of
improving the custom index support is the right to do, but a couple of
*client side* examples of handling the migration could help clarify the
consequences for the existing users that may be affected.

For example, perhaps we should keep --allow-all-external, but have it
mean that pip automatically adds new custom index URLs given for the
requested packages. Even if it emitted a deprecation warning, clients using
it would keep working in the face of the proposed changes to the simple API
link handling.

Regards,
Nick.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-06 Thread Donald Stufft

On Jun 6, 2014, at 7:46 PM, Nick Coghlan ncogh...@gmail.com wrote:

 
 On 7 Jun 2014 06:08, Donald Stufft don...@stufft.io wrote:
 
 
  On Jun 6, 2014, at 9:41 AM, holger krekel hol...@merlinux.eu wrote:
  
   Once you care for ACLs for indexes and releases you have a number
   of issues to consider, it's hardly related to PEP470/PEP438.
 
  It is related, because it means that the exact same mechanisms can be used,
  people don’t have to learn two different ways of specifying externally 
  hosted
  projects. In fact it also teaches them how to specify mirrors and the like 
  as well
  something that any devpi user is already going to have to learn how to do.
 
 This is the key benefit of PEP 470 from my perspective: some aspects of the 
 Python packaging ecosystem suffer from a bad case of too many ways to do 
 it, and if we're ever going to fix that, we need to be ruthless in culling 
 redundant concepts.
 
 Specifying custom indexes is a feature with a lot of use cases - local 
 mirrors and private indexes being two of the big ones. By contrast, external 
 references from the simple API duplicate a small subset of the custom index 
 functionality in a way that introduces a whole slew of new concepts that 
 still need to be documented and learned, even if the advice is don't use 
 that, use custom indexes instead.
 
 As far as dev-pi goes, if it's only mirroring links rather than externally 
 hosted files today, then in the future, it will still automatically mirror 
 the external index URLs. Dependency update scanners could follow those links 
 automatically, even if pip install doesn't check them by default.
 
 One other nice consequence of PEP 470 should make it easier for organisations 
 to flag and investigate cases where they're relying on an upstream source 
 other than PyPI, regardless of whether they care about the details of their 
 dependencies' hosting for speed, reliability or legal reasons.
 
 From a migration perspective, how hard would it be to automate generation of 
 a custom index page on pythonhosted.org for projects currently relying on 
 external references? That would still let us make the client changes without 
 needing to special case PIL.
 
 

Not very difficult. My current crawl script could generate a minimal one with 
some minor modifications (it’d have to save the whole URL instead of just the 
filename) and would take about 3 hours to process. This process would also weed 
out links which have died and the like. Downside would be these files wouldn’t 
be verifies so it would be external + unsafe index since we don’t have hash 
information to make them safe. Of course this would be the case for PIL anyways 
which easily makes up most of this traffic so this could just end up in the 
wash as far as how “safe” it is.
 Also, it occurred to me that while the latest/any split matters for new 
 users, we still need to consider the impact on projects which have pinned 
 dependencies on older versions of packages that were previously externally 
 hosted, but have moved to PyPI for more recent releases. I still think 
 dropping the external reference feature from the simple API in favour of 
 improving the custom index support is the right to do, but a couple of 
 *client side* examples of handling the migration could help clarify the 
 consequences for the existing users that may be affected.
 
 

Right, this was one of the reasons my old numbers had a split at 50%, part of 
the idea was that a project with less than some percent of it’s files hosted on 
PyPI had a smaller “breakage” surface, even for old pinned versions. I can get 
these numbers too again if they’d be useful, though I’m not sure if they should 
go in the PEP or not, it’s already kind of heavy on the numbers I think and I’m 
not sure additional numbers would be more or less confusing.

What do you mean by client side examples of handling the migration? I’m 
assuming you mean something other than the examples which show how to utilize 
the new indexes?
 For example, perhaps we should keep --allow-all-external, but have it mean 
 that pip automatically adds new custom index URLs given for the requested 
 packages. Even if it emitted a deprecation warning, clients using it would 
 keep working in the face of the proposed changes to the simple API link 
 handling.
 
Well it’d actually expand what —allow-all-external means, since it’d also allow 
those unsafely hosted files. I’m not sure it’d be a good idea to silently (or 
with a warning even) upgrade an option from a “do this to allow all safely 
hosted files” to a “do this to allow a whole bunch of legacy and unsafely 
hosted files”. The one upside to that is we’d direct link to files instead of 
relying on scraping so you’d have to actually rely on an unsafe file to be at 
risk, but it still makes me nervous.

It’s possible we could add a flag for this, but I’m not sure how useful it’d be 
since it’d only be in pip 1.6+ and unless people upgrade to that 

[Distutils] PEP 470 Round 2 - Using Multi Index Support for External to PyPI Package File Hosting

2014-06-05 Thread Donald Stufft
Here's round 2 of PEP 470.

You can see it online at https://python.org/dev/peps/pep-0470/ or below.

Notable changes:

- Ensure it's obvious this strictly deals with the installer API and does not
  affect a project's ability to register their project on PyPI for human
  consumptions.

- Mention that the functional mechanisms that make it possible for an end user
  to specify the additional locations have existed for a long time across many
  versions of the installers.

- Explicitly mention that the installer changes from PEP 438 should be
  deprecated and removed as part of this PEP.

- Explicitly mention pythonhosted.org as a location that authors can use to
  host an index if they do not wish to purchase a TLS certificate or host
  additional infrastructure.

- Include that a link to PyPI ToS should be included in the emails sent to
  authors to remind them of the PyPI ToS.

- Special case PIL as it is an outlier in terms of impact.

- Fill out the impact sections further to provide more detail


Abstract


This PEP proposes that the official means of having an installer locate and
find package files which are hosted externally to PyPI become the use of
multi index support instead of the practice of using external links on the
simple installer API.

It is important to remember that this is **not** about forcing anyone to host
their files on PyPI. If someone does not wish to do so they will never be under
any obligation too. They can still list their project in PyPI as an index, and
the tooling will still allow them to host it elsewhere.

This PEP strictly is concerned with the Simple Installer API and how automated
installers interact with PyPI, it has no bearing on the informational pages
which are primarily for human consumption.


Rationale
=

There is a long history documented in PEP 438 that explains why externally
hosted files exist today in the state that they do on PyPI. For the sake of
brevity I will not duplicate that and instead urge readers to first take a look
at PEP 438 for background.

There are currently two primary ways for a project to make itself available
without directly hosting the package files on PyPI. They can either include
links to the package files in the simpler installer API or they can publish
a custom package index which contains their project.


Custom Additional Index
---

Each installer which speaks to PyPI offers a mechanism for the user invoking
that installer to provide additional custom locations to search for files
during the dependency resolution phase. For pip these locations can be
configured per invocation, per shell environment, per requirements file, per
virtual environment, and per user. The mechanism for specifying additional
locations have existed within pip and setuptools for many years, by comparison
the mechanisms in PEP 438 and any other new mechanism will have existed for
only a short period of time (if they exist at all currently).

The use of additional indexes instead of external links on the simple
installer API provides a simple clean interface which is consistent with the
way most Linux package systems work (apt-get, yum, etc). More importantly it
works the same even for projects which are commercial or otherwise have their
access restricted in some form (private networks, password, IP ACLs etc)
while the external links method only realistically works for projects which
do not have their access restricted.

Compared to the complex rules which a project must be aware of to prevent
themselves from being considered unsafely hosted setting up an index is fairly
trivial and in the simplest case does not require anything more than a
filesystem and a standard web server such as Nginx or Twisted Web. Even if
using simple static hosting without autoindexing support, it is still
straightforward to generate appropriate index pages as static HTML.

Example Index with Twisted Web
~~

1. Create a root directory for your index, for the purposes of the example
   I'll assume you've chosen ``/var/www/index.example.com/``.
2. Inside of this root directory, create a directory for each project such
   as ``mkdir -p /var/www/index.example.com/{foo,bar,other}/``.
3. Place the package files for each project in their respective folder,
   creating paths like ``/var/www/index.example.com/foo/foo-1.0.tar.gz``.
4. Configure Twisted Web to serve the root directory, ideally with TLS.

::

$ twistd -n web --path /var/www/index.example.com/


Examples of Additional indexes with pip
~~~

**Invocation:**

::

$ pip install --extra-index-url https://pypi.example.com/ foobar

**Shell Environment:**

::

$ export PIP_EXTRA_INDEX_URL=https://pypi.example.com/
$ pip install foobar

**Requirements File:**

::

$ echo --extra-index-url https://pypi.example.com/\nfoobar;  
requirements.txt
$ pip install -r requirements.txt

**Virtual Environment:**

::

$