PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain.
For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. Reasoning: 1. It is difficult to secure the process of spidering external links for download. 1a. The only way I can think offhand is by requiring uploading a hash of the expected files to PyPI along with the download link and removing all urls except for the download_url. This has the effect that only 1 file can be associated with a particular release. 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide increased uptime and better latency/throughput across the globe. 4. Privacy implications, as a user it is not particularly obvious when I run `pip install Foo` what hosts I will be able issuing requests against. It is obvious that I will be contacting PyPI and I will have made the decision to trust PyPI however it is not obvious what other hosts will be able to gather information about me, including what packages I am installing. This becomes even more difficult to determine the deeper my dependency tree goes.
_______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig