On May 16, 2014, at 6:16 AM, holger krekel <[email protected]> wrote:
> Hi Donald, Nick, Richard, all, > > finally got around to read and think about the issues discussed in PEP470. > First of all thanks for going through the effort of trying to > advance the overall situation with a focus on making it easier > for our wonderful and beloved "end users" :) > > However, I think PEP470 needs to achieve stronger backward compatibility for > end-users because, as is typical for the 99%, they like to see change > but hate to be forced to change themselves. > > Allow me to remind of how PEP438 worked in this regard: all > end users always remained able to install all projects, including those > with ancient tools and they all benefitted from the changes PEP438 > brought: 90% of the projects were automatically switched to > "pypi-explicit" mode, speeding up and making more reliable installs for > everyone across the board. Let me thank specifically and once > again our grand tooler Donald here who implemented most of it. > > However, PEP470 does not achieve this level of backward compatibility yet. > Let's look at its current procedure leading up to the final switch: > > "After that switch, an email will be sent to projects which rely on > hosting external to PyPI. This email will warn these projects that > externally hosted files have been deprecated on PyPI and that in 6 > months from the time of that email that all external links will be > removed from the installer APIs. (...) > > Five months after the initial email, another email must be sent to > any projects still relying on external hosting. (...) > > Finally a month later all projects will be switched to the pypa-only > mode and PyPI will be modified to remove the externally linked files > functionality." > > This process tries to trigger changes from those 2974 project maintainers > who are today operating in pypi-crawl* modes. If we are left with a 1000 > stale project maintainers at final-switch time, and speculate about just 100 > downloads for each of their projects, it means this final switch may get > us 100000 failing installation interactions the day after the final switch. > Might be higher or lower, but i hope we agree that we'll very likely > have a significant "stale project maintainer" problem affecting > many end-users and existing CI installations etc. > > Even for those maintainers who switch to use an external index > as currently advertised by the PEP, and with their release files also > being downloaded a 100 times each, we'll have another 50000 interactions > from end users which need to re-configure their tool usage to switch to > use an external index. Granted, those using a new pip version would get > a useful hint how to do that. Others, using older versions, would have > to discover the project pypi website to hopefully understand how to > make their stuff work again. > > In any case, we'd likely get a ton of end-user side installation issues > and i think PEP470 needs to be modified to try minimize this number. > It could take the ball where PEP438 dropped it: > > "Thus the hope is that eventually all projects on PyPI can be migrated to > the pypi-explicit mode, while preserving the ability to install release > files hosted externally via installer tools. Deprecation of hosting > modes to eventually only allow the pypi-explicit mode is NOT REGULATED > by this PEP but is expected to become feasible some time after > successful implementation of the transition phases described in this > PEP. It is expected that deprecation requires a new process to deal with > abandoned packages because of unreachable maintainers for still popular > packages." > > PEP470 could be this successor, cleaning up and simplifying the situation. > But how to maintain full backward compat and get rid of crawling? > here is a sketched process how we could get rid of pypi-crawl* modes: > > - sent a warning note to maintainers a month before their pypi-crawl* > hosted projects are converted (informing about the process, see next points). > Advertise a tool to convert pypi-crawl* hosting modes to pypi-explicit. > This tool automates the crawling to register all found release files > either as explicit references with MD5s, or upload them to become > pypi-hosted files, at the option of the maintainer. It will also switch > the hosting mode on the pypi site automatically. > > We'll also disallow pypi-crawl* modes on pypi at warning time for new > projects or to switch to them from other modes. > > - a month later a pypi admin (guess who!) uses the same conversion tool, > but with his admin superpowers, to convert any remaining > pypi-crawl* hosting-mode projects automatically with one addition: > all those admin-converted projects will get a "stale" flag > because the maintainer did not react and perform the conversion himself. > This "stale" status will be shown on the web page and new tool releases > can maybe learn to read this flag from the simple page so that they can warn > the end users they are installing a project with a known-to-be stale > maintainer. > > The admin-driven conversion can be done incrementally in bunches, > to make it even more unlikely that we are going to face storms > of unhappy end users at any one point and to iron out issues as we go. > > The result of this process is that we have only one hosting mode: > pypi-explicit which is already introduced and specified with PEP438. > And pypi's simple pages will continue to present two kinds of links: > > - rel="internal": release files directly uploaded to pypi > > - other external links will be direct URLS with hash-checksums to external > release files. Tools already can already recognize them and inform the user. > > sidenote: if people have a PIP_DOWNLOAD_CACHE they will > only depend on reachability of pypi after they first installed > an external dependency. So it's operationally a good situation given > the fact that using "--allow-externals" provides exactly the same > file installation integrity as pypi hosted files itself do. > > After we completed the automated admin-pypi transition there is no external > scraping, no unverified links and tools could drop support for them over > time. And there remain two ways how you can release files: upload them > to pypi or register a checksummed link. In addition, we will have > a clear list of a bunch of "stale" marked projects and can work > with it further. > > Note that with this proposed process 93% of maintainers, most toolers > and all end-users can remain ignorant of this PEP and will not be > bothered: everything just continues to work unmodified. Some end users > will experience a speed up because the client-side will not need > to download/crawl additional external simple pages. There are no new > things people need to learn except for the "crawl" maintainers to whom > we nicely and empathically send a message: "switch or be switched" :) > > You'll note that the process proposed here does not require > pypi.python.org to manage "external additional indexes" information or > tools to learn to recognize them. At this point, I am not sure it's > really needed for the cleanup and simplifiation issues PEP470 tries to > address. > > backward-compat-is-a-thing'ly yours, > holger Backwards compatibility is a noble goal! It is not however the only goal. I feel very strongly that PyPI should not make security sensitive claims about a project it does not know to be true. Here's the thing, we do not know if the files we discover are safe files and we have no way to verify them. We don't even know that the original author still owns the domain and someone hasn't bought it up and put malicious files on them. Your proposal will change it so that PyPI will make security claims about a project without actually being able to actually know that those claims are accurate. On top of that, it still fails to address: * The reliability of the externally hosted files, especially for projects which are now "stale". How likely is it that an unmaintained project ends up having it's external file links bitrot? * The legality of mirroring. End users trying to mirror are still responsible for determining if they are able to mirror this file. This is especially important in China or other bandwidth constrained environments where good access (or access at all) to the Fastly CDN cannot be achieved. Breaking backwards compatibility is always a hard choice, however I think it makes sense in this case. There is no way to actually move forward on this issue without either breaking or making potentially false claims about the validity of a file. Furthermore the 7% of projects affected is the most maximum way of doing the tally. I did not want my own biases to influence the statistics so I tried to remove any editorializing from those statistics. However that being said, a significant portion of that 7% has only a few (sometimes only 1) old releases hosted externally. Often times when I've pointed this out to authors they didn't even realize it and they had just forgotten to call ``setup.py upload``. Finally of the projects left a lot of them are very old (I've found some that were last released in 2003). A lot of them do not work with any modern version of Python and some of them do not even have a ``setup.py`` at all and thus are not installable at all. These are all issues that my processing didn't attempt to classify because I wanted to remove my personal bias from the numbers, but the simple fact is that while the maximum amount may be 7%, the actual amount is going to be far far less than that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
