On Mar 8, 2013, at 8:13 AM, Donald Stufft <don...@stufft.io> wrote:

> 
> On Mar 8, 2013, at 8:07 AM, Jesse Noller <jnol...@gmail.com> wrote:
> 
>> As long as external URLs eventually are completely removed I'm okay with 
>> caching things
> 
> So I have mixed feelings on caching the urls. I'm not completely against it 
> however it does present a problem of "Well how do we know if the url we are 
> fetching is the accurate url for that package". Downloading and caching them 
> and presenting them the same as if someone uploaded them directly to PyPI 
> loses a point of distinction between "PyPI can verify this is the package 
> that the author intended to release" and "This is something we think that the 
> author releases, maybe, probably?".

The distinction can be fixed with a rel="external" or rel="cached" or whatever. 
I believe all the tools will still find them as downloadable targets and can be 
adapted to print a warning if that's desired. We *might* be caching a package 
that has already been replaced by an attacker but by caching and centralizing 
it we have a better way of removing it once it's found. The legal issues is 
something we'd probably need to ask VanL?

So that's an Ok, Neutral, and Unknown for my 3 major complaints.

> 
> It does solve the backwards compatibility issue of killing external urls 
> immediately so I'm not flat out against it, but there may be legal issues 
> involved too?
> 
>> 
>> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" <m...@egenix.com> wrote:
>> 
>>> On 08.03.2013 02:40, Donald Stufft wrote:
>>>> So I updated my script (had to remove eventlet) and I believe it's now 
>>>> accurate. The total time was ~54 hours so this is hardly scientific but it 
>>>> should give a good idea what sort of impact we are talking about.
>>>> 
>>>> This is a list of versions that pip's PackageFinder (what it uses to 
>>>> locate packages to install) could find that were not available on PyPI.
>>>> 
>>>> The results and script is available at: 
>>>> https://gist.github.com/dstufft/5088915
>>>> 
>>>> Some statistics:
>>>> 
>>>>  Projects affected (with dev): 2269
>>>>  Versions affected (with dev): 8006
>>>> 
>>>>  Projects affected (without dev): 1880
>>>>  Versions affected (without dev): 7586
>>>> 
>>>> These numbers are if all external urls were immediately removed from PyPI, 
>>>> so this would be the total affected. This does not test if the actual 
>>>> package is installable, just if pip is able to locate an url that it 
>>>> thinks represents a version for that project.
>>> 
>>> Thanks for running the test.
>>> 
>>> About 10% of all packages. The numbers are already impressive,
>>> but if you factor in the popularity of some of those
>>> packages, the situation becomes worse.
>>> 
>>> I'm beginning to wonder whether caching the external link content
>>> on the PyPI CDN wouldn't be a better idea.
>>> 
>>> We'd have to make that legally waterproof and also have an opt-out
>>> mechanism, but it would get us from here to there a lot faster.
>>> 
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> 
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>>> 
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>> 
>>> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> 
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>> 
>>> eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>  D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>         Registered at Amtsgericht Duesseldorf: HRB 46611
>>>             http://www.egenix.com/company/contact/
>>> _______________________________________________
>>> Catalog-SIG mailing list
>>> Catalog-SIG@python.org
>>> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG@python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to