Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Daniel Holth
On Fri, Jul 25, 2014 at 9:49 AM, Donald Stufft  wrote:
> On July 25, 2014 at 9:43:28 AM, Nick Coghlan (ncogh...@gmail.com) wrote:
>
> On 25 July 2014 23:34, Donald Stufft  wrote:
>> On July 25, 2014 at 9:29:14 AM, Richard Jones (r1chardj0...@gmail.com)
>> wrote:
>>
>> On 25 July 2014 15:21, Nick Coghlan  wrote:
>>>
>>> On 25 July 2014 23:13, Richard Jones  wrote:
>>> > A variation on the above two ideas is to just record the *link* to the
>>> > externally-hosted file from PyPI, rather than that file's content. It
>>> > is
>>> > more error-prone, but avoids issues of file ownership.
>>>
>>> This is essentially what PEP 470 proposes, except that the link says
>>> "this project is hosted on this external index, check there for the
>>> relevant details" rather than having individual links for every
>>> externally hosted version.
>>
>>
>> Well, not quite. The PEP proposes a link to a page for an index with
>> arbitrary contents. The above would link only to packages for the /simple/
>> name in question. A very small amount of protection against accidents but
>> some protection nonetheless. Also, an installer does not need to go to
>> that
>> external index to find anything - everything is listed in the one place.
>>
>> This is still a second mechanism that users have to know and be aware of.
>> The multi index support isn’t going away and it is the primary way to
>> support things not hosted on PyPI in every situation *except* the “well I
>> have a publicly available thing, but I don’t want to upload it to PyPI for
>> whatever reason” case. As evidenced by the numbers I really don’t think
>> that
>> use case is a big enough use case to warrant it’s own special mechanisms.
>> Especially given the fact that it forces some bad architecture on the
>> installers.
>
> The Linux distros have honed paranoia to a fine art, and even we don't
> think maintaining explicit package<->repo maps is worth the hassle,
> especially when end-to-end package signing is a better solution to
> handling provenance concerns.
>
> If people are especially worried about it (especially given we don't
> have generally usable end-to-end signing yet), then a simpler solution
> than package <-> repo maps is to have repo (or index, in PyPI
> terminology) priorities, such that packages from lower priority repos
> can never take precedence over those from higher priority repos.
>
> With yum-plugin-priorities, repos get a default priority of 99, and
> you can specify an explicit priority in each repo config. This can be
> used to have a company internal repo take precedence over the Red Hat
> or community repos, for example.
>
>
> Not that this solves it generically, but I’ve toyed with the idea of a
> requirements 2.0 file format that included constructions that said things
> like “for this dependency, mandate it come from a particular index” or the
> like.

There was a similar idea in the wheel signatures scheme, where you
would include the public keys of the allowed signers alongside the
dependency name. The system would have allowed you to install
particular built dependencies but only if they were signed by one of a
set of signers, preventing you from accidentally installing the wrong
thing. At the time I'd proposed extending the array access syntax
packagename[keyidentifier=xyzabc...] used for extras.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Donald Stufft
On July 25, 2014 at 9:43:28 AM, Nick Coghlan (ncogh...@gmail.com) wrote:
On 25 July 2014 23:34, Donald Stufft  wrote:  
> On July 25, 2014 at 9:29:14 AM, Richard Jones (r1chardj0...@gmail.com)  
> wrote:  
>  
> On 25 July 2014 15:21, Nick Coghlan  wrote:  
>>  
>> On 25 July 2014 23:13, Richard Jones  wrote:  
>> > A variation on the above two ideas is to just record the *link* to the  
>> > externally-hosted file from PyPI, rather than that file's content. It is  
>> > more error-prone, but avoids issues of file ownership.  
>>  
>> This is essentially what PEP 470 proposes, except that the link says  
>> "this project is hosted on this external index, check there for the  
>> relevant details" rather than having individual links for every  
>> externally hosted version.  
>  
>  
> Well, not quite. The PEP proposes a link to a page for an index with  
> arbitrary contents. The above would link only to packages for the /simple/  
> name in question. A very small amount of protection against accidents but  
> some protection nonetheless. Also, an installer does not need to go to that  
> external index to find anything - everything is listed in the one place.  
>  
> This is still a second mechanism that users have to know and be aware of.  
> The multi index support isn’t going away and it is the primary way to  
> support things not hosted on PyPI in every situation *except* the “well I  
> have a publicly available thing, but I don’t want to upload it to PyPI for  
> whatever reason” case. As evidenced by the numbers I really don’t think that  
> use case is a big enough use case to warrant it’s own special mechanisms.  
> Especially given the fact that it forces some bad architecture on the  
> installers.  

The Linux distros have honed paranoia to a fine art, and even we don't  
think maintaining explicit package<->repo maps is worth the hassle,  
especially when end-to-end package signing is a better solution to  
handling provenance concerns.  

If people are especially worried about it (especially given we don't  
have generally usable end-to-end signing yet), then a simpler solution  
than package <-> repo maps is to have repo (or index, in PyPI  
terminology) priorities, such that packages from lower priority repos  
can never take precedence over those from higher priority repos.  

With yum-plugin-priorities, repos get a default priority of 99, and  
you can specify an explicit priority in each repo config. This can be  
used to have a company internal repo take precedence over the Red Hat  
or community repos, for example.  


Not that this solves it generically, but I’ve toyed with the idea of a 
requirements 2.0 file format that included constructions that said things like 
“for this dependency, mandate it come from a particular index” or the like.

-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Nick Coghlan
On 25 July 2014 23:34, Donald Stufft  wrote:
> On July 25, 2014 at 9:29:14 AM, Richard Jones (r1chardj0...@gmail.com)
> wrote:
>
> On 25 July 2014 15:21, Nick Coghlan  wrote:
>>
>> On 25 July 2014 23:13, Richard Jones  wrote:
>> > A variation on the above two ideas is to just record the *link* to the
>> > externally-hosted file from PyPI, rather than that file's content. It is
>> > more error-prone, but avoids issues of file ownership.
>>
>> This is essentially what PEP 470 proposes, except that the link says
>> "this project is hosted on this external index, check there for the
>> relevant details" rather than having individual links for every
>> externally hosted version.
>
>
> Well, not quite. The PEP proposes a link to a page for an index with
> arbitrary contents. The above would link only to packages for the /simple/
> name in question.  A very small amount of protection against accidents but
> some protection nonetheless. Also, an installer does not need to go to that
> external index to find anything - everything is listed in the one place.
>
> This is still a second mechanism that users have to know and be aware of.
> The multi index support isn’t going away and it is the primary way to
> support things not hosted on PyPI in every situation *except* the “well I
> have a publicly available thing, but I don’t want to upload it to PyPI for
> whatever reason” case. As evidenced by the numbers I really don’t think that
> use case is a big enough use case to warrant it’s own special mechanisms.
> Especially given the fact that it forces some bad architecture on the
> installers.

The Linux distros have honed paranoia to a fine art, and even we don't
think maintaining explicit package<->repo maps is worth the hassle,
especially when end-to-end package signing is a better solution to
handling provenance concerns.

If people are especially worried about it (especially given we don't
have generally usable end-to-end signing yet), then a simpler solution
than package <-> repo maps is to have repo (or index, in PyPI
terminology) priorities, such that packages from lower priority repos
can never take precedence over those from higher priority repos.

With yum-plugin-priorities, repos get a default priority of 99, and
you can specify an explicit priority in each repo config. This can be
used to have a company internal repo take precedence over the Red Hat
or community repos, for example.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Donald Stufft
On July 25, 2014 at 9:29:14 AM, Richard Jones (r1chardj0...@gmail.com) wrote:
On 25 July 2014 15:21, Nick Coghlan  wrote:
On 25 July 2014 23:13, Richard Jones  wrote:
> A variation on the above two ideas is to just record the *link* to the
> externally-hosted file from PyPI, rather than that file's content. It is
> more error-prone, but avoids issues of file ownership.

This is essentially what PEP 470 proposes, except that the link says
"this project is hosted on this external index, check there for the
relevant details" rather than having individual links for every
externally hosted version.

Well, not quite. The PEP proposes a link to a page for an index with arbitrary 
contents. The above would link only to packages for the /simple/ name in 
question.  A very small amount of protection against accidents but some 
protection nonetheless. Also, an installer does not need to go to that external 
index to find anything - everything is listed in the one place.


     Richard


This is still a second mechanism that users have to know and be aware of. The 
multi index support isn’t going away and it is the primary way to support 
things not hosted on PyPI in every situation *except* the “well I have a 
publicly available thing, but I don’t want to upload it to PyPI for whatever 
reason” case. As evidenced by the numbers I really don’t think that use case is 
a big enough use case to warrant it’s own special mechanisms. Especially given 
the fact that it forces some bad architecture on the installers.

-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Richard Jones
On 25 July 2014 15:21, Nick Coghlan  wrote:

> On 25 July 2014 23:13, Richard Jones  wrote:
> > A variation on the above two ideas is to just record the *link* to the
> > externally-hosted file from PyPI, rather than that file's content. It is
> > more error-prone, but avoids issues of file ownership.
>
> This is essentially what PEP 470 proposes, except that the link says
> "this project is hosted on this external index, check there for the
> relevant details" rather than having individual links for every
> externally hosted version.
>

Well, not quite. The PEP proposes a link to a page for an index with
arbitrary contents. The above would link only to packages for the /simple/
name in question.  A very small amount of protection against accidents but
some protection nonetheless. Also, an installer does not need to go to that
external index to find anything - everything is listed in the one place.


 Richard
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Daniel Holth
Maybe we should get on the namespaces bandwagon and allow
organizations to register a prefix. Then you would be able to know
that dependencies called "company/mysupersecretprogram" would never
accidentally exist on pypi

On Fri, Jul 25, 2014 at 9:21 AM, Nick Coghlan  wrote:
> On 25 July 2014 23:13, Richard Jones  wrote:
>>> Yes, those are two solutions, another solution is for PyPI to allow
>>> registering a namespace, like dstufft.* and companies simply name all their
>>> packages that. This isn’t a unique problem to this PEP though. This problem
>>> exists anytime a company has an internal package that they do not want on
>>> PyPI. It’s unlikely that any of those companies are using the external link
>>> feature if that package is internal.
>>
>> As i mentioned, using devpi solves this issue for companies hosting internal
>> indexes. Requiring companies to register names on a public index to avoid
>> collision has been raised a few times along the lines of "I hope we don't
>> have to register names on the public index to avoid this." :)
>
> Restricting packages to come from particular indexes is (or should be)
> independent of the PEP 470 design. pip has multiple index support
> today, and if you enable it, any enabled index can currently provide
> any package.
>
> If that's a significant concern for anyone, changing it is just a pip
> RFE rather than needing to be part of a PEP.
>
>>> > There still remains the usability issue of unsophisticated users running
>>> > into external indexes and needing to cope with that in one of a myriad of
>>> > ways as evidenced by the PEP. One solution proposed and refined at the
>>> > EuroPython gathering today has PyPI caching packages from external indexes
>>> > *for packages registered with PyPI*. That is: a requirement of registering
>>> > your package (and external index URL) with PyPI is that you grant PyPI
>>> > permission to cache packages from your index in the central index - a
>>> > scenario that is ideal for users. Organisations not wishing to do that
>>> > understand that they're the ones causing the pain for users.
>>>
>>> We can’t cache the packages which aren’t currently hosted on PyPI. Not in
>>> an automatic fashion anyways. We’d need to ensure that their license allows
>>> us to do so. The PyPI ToS ensures this when they upload but if they never
>>> upload then they’ve never agreed to the ToS for that artifact.
>>
>> I didn't state it clearly: this would be opt-in with the project granting
>> PyPI permission to perform this caching. Their option is to not do so and
>> simply not have a listing on PyPI.
>
> This is exactly the "packages not hosted on PyPI are second class
> citizens" scenario we're trying to *avoid*. We can't ask a global
> community to comply with US export laws just to be listed on the main
> community index.
>
>>> > An extension of this proposal is quite elegant; to reduce the pain of
>>> > migration from the current approach to the new, we implement that caching
>>> > right now, using the current simple index scraping. This ensures the
>>> > packages are available to all clients throughout the transition period.
>>>
>>> As said above, we can’t legally do this automatically, we’d need to ensure
>>> that there is a license that grants us distribution rights.
>>
>> A variation on the above two ideas is to just record the *link* to the
>> externally-hosted file from PyPI, rather than that file's content. It is
>> more error-prone, but avoids issues of file ownership.
>
> This is essentially what PEP 470 proposes, except that the link says
> "this project is hosted on this external index, check there for the
> relevant details" rather than having individual links for every
> externally hosted version.
>
> Cheers,
> Nick.

and there would be great rejoicing. IIUC conda's binstar does
something like this...
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Nick Coghlan
On 25 July 2014 23:13, Richard Jones  wrote:
>> Yes, those are two solutions, another solution is for PyPI to allow
>> registering a namespace, like dstufft.* and companies simply name all their
>> packages that. This isn’t a unique problem to this PEP though. This problem
>> exists anytime a company has an internal package that they do not want on
>> PyPI. It’s unlikely that any of those companies are using the external link
>> feature if that package is internal.
>
> As i mentioned, using devpi solves this issue for companies hosting internal
> indexes. Requiring companies to register names on a public index to avoid
> collision has been raised a few times along the lines of "I hope we don't
> have to register names on the public index to avoid this." :)

Restricting packages to come from particular indexes is (or should be)
independent of the PEP 470 design. pip has multiple index support
today, and if you enable it, any enabled index can currently provide
any package.

If that's a significant concern for anyone, changing it is just a pip
RFE rather than needing to be part of a PEP.

>> > There still remains the usability issue of unsophisticated users running
>> > into external indexes and needing to cope with that in one of a myriad of
>> > ways as evidenced by the PEP. One solution proposed and refined at the
>> > EuroPython gathering today has PyPI caching packages from external indexes
>> > *for packages registered with PyPI*. That is: a requirement of registering
>> > your package (and external index URL) with PyPI is that you grant PyPI
>> > permission to cache packages from your index in the central index - a
>> > scenario that is ideal for users. Organisations not wishing to do that
>> > understand that they're the ones causing the pain for users.
>>
>> We can’t cache the packages which aren’t currently hosted on PyPI. Not in
>> an automatic fashion anyways. We’d need to ensure that their license allows
>> us to do so. The PyPI ToS ensures this when they upload but if they never
>> upload then they’ve never agreed to the ToS for that artifact.
>
> I didn't state it clearly: this would be opt-in with the project granting
> PyPI permission to perform this caching. Their option is to not do so and
> simply not have a listing on PyPI.

This is exactly the "packages not hosted on PyPI are second class
citizens" scenario we're trying to *avoid*. We can't ask a global
community to comply with US export laws just to be listed on the main
community index.

>> > An extension of this proposal is quite elegant; to reduce the pain of
>> > migration from the current approach to the new, we implement that caching
>> > right now, using the current simple index scraping. This ensures the
>> > packages are available to all clients throughout the transition period.
>>
>> As said above, we can’t legally do this automatically, we’d need to ensure
>> that there is a license that grants us distribution rights.
>
> A variation on the above two ideas is to just record the *link* to the
> externally-hosted file from PyPI, rather than that file's content. It is
> more error-prone, but avoids issues of file ownership.

This is essentially what PEP 470 proposes, except that the link says
"this project is hosted on this external index, check there for the
relevant details" rather than having individual links for every
externally hosted version.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-25 Thread Richard Jones
[apologies for the terrible quoting, gmail's magic failed today]

On 24 July 2014 17:41, Donald Stufft  wrote:
> On July 24, 2014 at 7:26:11 AM, Richard Jones (r1chardj0...@gmail.com)
wrote:
>
> > This PEP proposes a potentially confusing break for both users and
packagers. In particular, during the transition there will be packages
which just disappear as far as users are concerned. In those cases users
will indeed need to learn that there is a /simple/ page and they will need
to view it in order to find the URL to add to their installation invocation
in some manner. Even once install tools start supporting the new mechanism,
users who lag (which as we all know are the vast majority) will run into
this.
>
> So we lengthen the transition time, gate it on an installer that has the
automatic hinting becoming the dominant version. We can pretty easily see
exactly what version of the tooling is being used to install stuff from
PyPI.

I would like to see the PEP have detail added around this transition and
how we will avoid packages vanishing. Perhaps we could have a versioned
/simple/ to allow transition to go more smoothly with monitoring activity
on the two versions? /simple-2/? /simpler/? :)

Additionally, it's been pointed out to me that I've been running on
assumptions about how multi-index support works. The algorithm that must be
implemented by installer tools needs to be spelled out in the PEP.


> Even ignoring the malicious possibility there is a probably greater
chance of accidental mistakes:
>
> - company sets up internal index using pip's multi-index support and
hosts various modules
> - someone quite innocently uploads something with the same name, never
version, to pypi
> - company installs now use that unknown code
>
> devpi avoids this (I would recommend it over multi-index for companies
anyway) by having a white list system for packages that might be pulled
from upstream that would clash with internal packages.
>
> As Nick's mentioned, a signing infrastructure - tied to the index
registration of a name - could solve this problem.
>
> Yes, those are two solutions, another solution is for PyPI to allow
registering a namespace, like dstufft.* and companies simply name all their
packages that. This isn’t a unique problem to this PEP though. This problem
exists anytime a company has an internal package that they do not want on
PyPI. It’s unlikely that any of those companies are using the external link
feature if that package is internal.

As i mentioned, using devpi solves this issue for companies hosting
internal indexes. Requiring companies to register names on a public index
to avoid collision has been raised a few times along the lines of "I hope
we don't have to register names on the public index to avoid this." :)


> > There still remains the usability issue of unsophisticated users
running into external indexes and needing to cope with that in one of a
myriad of ways as evidenced by the PEP. One solution proposed and refined
at the EuroPython gathering today has PyPI caching packages from external
indexes *for packages registered with PyPI*. That is: a requirement of
registering your package (and external index URL) with PyPI is that you
grant PyPI permission to cache packages from your index in the central
index - a scenario that is ideal for users. Organisations not wishing to do
that understand that they're the ones causing the pain for users.
>
> We can’t cache the packages which aren’t currently hosted on PyPI. Not in
an automatic fashion anyways. We’d need to ensure that their license allows
us to do so. The PyPI ToS ensures this when they upload but if they never
upload then they’ve never agreed to the ToS for that artifact.

I didn't state it clearly: this would be opt-in with the project granting
PyPI permission to perform this caching. Their option is to not do so and
simply not have a listing on PyPI.

> > An extension of this proposal is quite elegant; to reduce the pain of
migration from the current approach to the new, we implement that caching
right now, using the current simple index scraping. This ensures the
packages are available to all clients throughout the transition period.
>
> As said above, we can’t legally do this automatically, we’d need to
ensure that there is a license that grants us distribution rights.

A variation on the above two ideas is to just record the *link* to the
externally-hosted file from PyPI, rather than that file's content. It is
more error-prone, but avoids issues of file ownership.


  Richard
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Nick Coghlan
On 25 Jul 2014 02:05, "Donald Stufft"  wrote:
>
> Sorry, I think the provides functionality is outside of the scope of what
we would use TUF for. It is *only* respected if you have that project
installed. In other words if there is a package “FakeDjango” which provides
“Django”, then ``pip install Django`` will *never* install “FakeDjango”.
However if you’ve already done ``pip install FakeDjango`` then later on you
do ``pip install Django`` it will see that it is already installed (because
“FakeDjango” provides it).

For the record, from a system integrator perspective, this is considered a
feature rather than a bug: it's designed so it's possible to swap in an
alternative to the real package as a temporary measure until the real one
catches up. For example, right now, getting systemd to work right inside a
Docker container is a bit tricky, but you don't really need it if you're
just running one or two services per container. The workaround is a
substitute package called "fakesystemd" - it lets the package installation
proceed, even though the systemd integration won't work. The folks actually
working with systemd inside Docker then swap the fake one out for the real
one.

> IOW it only matters once you’ve already chosen to trust that package and
have installed it. This is to prevent any sort of spoofing attacks and to
simplify the interface. This doesn’t prevent a project which you’ve elected
to trust by installing it from spoofing itself, but it’s impossible to
prevent them from doing that anyways without hobbling our package formats
so much that they are useless. For instance any ability to execute code
(such as setup.py!) means that FakeDjango could, once installed, spoof
Django just by dropping the relevant metadata files to say it is already
installed.

Yep. While it may sound self-serving (because it is - this is ultimately
one of the services that gets me paid), a commercial relationship that
helps assure them "this won't eat your machine" is one of the reasons
companies pay open source redistributors and other service providers for
software that is freely available directly from upstream. They're not
really paying for the software directly - they're outsourcing the task of
due diligence in checking whether the software is safe enough to allow it
to be installed on their systems. Even the core repos of the community
Linux distros provide a higher level of assurance than the "anything goes,
use at your own risk" style services like PyPI, Ubuntu PPAs and Fedora's
COPR.

That doesn't make the latter services bad, it just means they occupy a
niche in the ecosystem that makes using them directly inherently
inadvisable for users with a low tolerance for risk.

Cheers,
Nick.

>
> --
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
DCFA
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Justin Cappos
FYI: PEP 458 provides a way to address most of the security issues with
this as well.   (We call these "provides-everything" attacks in some of our
prior work: https://isis.poly.edu/~jcappos/papers/cappos_pmsec_tr08-02.pdf)

One way of handling this is that whomever registers the name can choose
what other packages can be registered that meet that dependency.   Another
is that PyPI could automatically manage the metadata for this.   Clearly
someone has to be responsible for making sure that this is 'off-by-default'
so that a malicious party cannot claim to provide a popular package and get
their software installed instead.   What do you think makes the most sense?

Even if only "the right" projects can create trusted packages for a
dependency, there are security issues also with respect to which package
should be trusted.   Suppose you have projects zap and bar, which should be
chosen to meet a dependency.   Which should be used?

With TUF we currently support them choosing a fixed project (zap or bar),
but supporting the most recent upload is also possible.   We had an
explicit tag and type of delegation in Stork for this case (the timestamp
tag), but I think we can get equivalent functionality with threshold
signatures in TUF.

Once we understand more about how people would like to use it, we can make
sure PEP 458 explains how this is supported in a clean way while minimizing
the security impact.

Thanks,
Justin


On Thu, Jul 24, 2014 at 11:41 AM, Donald Stufft  wrote:

> On July 24, 2014 at 7:26:11 AM, Richard Jones (r1chardj0...@gmail.com)
> wrote:
>
> Even ignoring the malicious possibility there is a probably greater chance
> of accidental mistakes:
>
> - company sets up internal index using pip's multi-index support and hosts
> various modules
> - someone quite innocently uploads something with the same name, never
> version, to pypi
> - company installs now use that unknown code
>
> devpi avoids this (I would recommend it over multi-index for companies
> anyway) by having a white list system for packages that might be pulled
> from upstream that would clash with internal packages.
>
> As Nick's mentioned, a signing infrastructure - tied to the index
> registration of a name - could solve this problem.
>
> Yes, those are two solutions, another solution is for PyPI to allow
> registering a namespace, like dstufft.* and companies simply name all their
> packages that. This isn’t a unique problem to this PEP though. This problem
> exists anytime a company has an internal package that they do not want on
> PyPI. It’s unlikely that any of those companies are using the external link
> feature if that package is internal.
>
>
>
> There still remains the usability issue of unsophisticated users running
> into external indexes and needing to cope with that in one of a myriad of
> ways as evidenced by the PEP. One solution proposed and refined at the
> EuroPython gathering today has PyPI caching packages from external indexes
> *for packages registered with PyPI*. That is: a requirement of registering
> your package (and external index URL) with PyPI is that you grant PyPI
> permission to cache packages from your index in the central index - a
> scenario that is ideal for users. Organisations not wishing to do that
> understand that they're the ones causing the pain for users.
>
> We can’t cache the packages which aren’t currently hosted on PyPI. Not in
> an automatic fashion anyways. We’d need to ensure that their license allows
> us to do so. The PyPI ToS ensures this when they upload but if they never
> upload then they’ve never agreed to the ToS for that artifact.
>
>
>
> An extension of this proposal is quite elegant; to reduce the pain of
> migration from the current approach to the new, we implement that caching
> right now, using the current simple index scraping. This ensures the
> packages are available to all clients throughout the transition period.
>
> As said above, we can’t legally do this automatically, we’d need to ensure
> that there is a license that grants us distribution rights.
>
>
>
> The transition issue was enough for those at the meeting today to urge me
> to reject the PEP.
>
> To be clear, there are really three issues at play:
>
> 1) Should we continue to support scraping external urls *at all*. This is
> a cause of a lot of problems in pip and it infects our architecture with
> things that cause confusing error messages that we cannot really get away
> from. It’s also super slow and grossly insecure.
>
> 2) Should we continue to support direct links from a project’s /simple/
> page to a downloadable file which isn’t hosted on PyPI.
>
> 3) If we allow direct links to a downloadable file from a project’s
> /simple/ page, do we mandate that they include a hash (and thus are safe)
> or do we also allow ones without a checksum (and thus are unsafe).
>
> For me, 1 is absolutely not. It is terrible and it is the cause of
> horrible UX issues as well as performance issues. However 1 is a

Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Donald Stufft
On July 24, 2014 at 11:58:01 AM, Justin Cappos (jcap...@nyu.edu) wrote:
FYI: PEP 458 provides a way to address most of the security issues with this as 
well.   (We call these "provides-everything" attacks in some of our prior work: 
https://isis.poly.edu/~jcappos/papers/cappos_pmsec_tr08-02.pdf)

One way of handling this is that whomever registers the name can choose what 
other packages can be registered that meet that dependency.   Another is that 
PyPI could automatically manage the metadata for this.   Clearly someone has to 
be responsible for making sure that this is 'off-by-default' so that a 
malicious party cannot claim to provide a popular package and get their 
software installed instead.   What do you think makes the most sense?

Even if only "the right" projects can create trusted packages for a dependency, 
there are security issues also with respect to which package should be trusted. 
  Suppose you have projects zap and bar, which should be chosen to meet a 
dependency.   Which should be used?

With TUF we currently support them choosing a fixed project (zap or bar), but 
supporting the most recent upload is also possible.   We had an explicit tag 
and type of delegation in Stork for this case (the timestamp tag), but I think 
we can get equivalent functionality with threshold signatures in TUF.

Once we understand more about how people would like to use it, we can make sure 
PEP 458 explains how this is supported in a clean way while minimizing the 
security impact.

Thanks,
Justin


Sorry, I think the provides functionality is outside of the scope of what we 
would use TUF for. It is *only* respected if you have that project installed. 
In other words if there is a package “FakeDjango” which provides “Django”, then 
``pip install Django`` will *never* install “FakeDjango”. However if you’ve 
already done ``pip install FakeDjango`` then later on you do ``pip install 
Django`` it will see that it is already installed (because “FakeDjango” 
provides it).

IOW it only matters once you’ve already chosen to trust that package and have 
installed it. This is to prevent any sort of spoofing attacks and to simplify 
the interface. This doesn’t prevent a project which you’ve elected to trust by 
installing it from spoofing itself, but it’s impossible to prevent them from 
doing that anyways without hobbling our package formats so much that they are 
useless. For instance any ability to execute code (such as setup.py!) means 
that FakeDjango could, once installed, spoof Django just by dropping the 
relevant metadata files to say it is already installed.

-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Donald Stufft
On July 24, 2014 at 8:23:59 AM, Stefan Krah (ste...@bytereef.org) wrote:
Richard Jones  wrote: 
> There still remains the usability issue of unsophisticated users running into 
> external indexes and needing to cope with that in one of a myriad of ways as 
> evidenced by the PEP. One solution proposed and refined at the EuroPython 
> gathering today has PyPI caching packages from external indexes *for packages 
> registered with PyPI*. That is: a requirement of registering your package 
> (and 
> external index URL) with PyPI is that you grant PyPI permission to cache 
> packages from your index in the central index - a scenario that is ideal for 
> users. 

-1. That is unlikely to solve the draconian-terms-and-conditions problem 
and one reason to host externally is to get your own download statistics. 
The ToS is not draconian, it is a minimal ToS which allows PyPI to function.

If people want/need additional stats we can add them to PyPI. This is on the 
TODO list anyways.




> Organisations not wishing to do that understand that they're the ones 
> causing the pain for users. 

No. First, checksummed external packages could be downloaded without asking 
at all. Second, if international authors are required to study US export law 
before uploading, I wonder who is causing the pain. 
With PEP 470 you are not required to study anything nor upload to PyPI, if you 
wish to host outside of PyPI you simply host an external index, which is as 
simple as a plain html file with links to the downloadable files.



Finally, how can an author cause pain for users? Without him, the work 
would not be there in the first place. 


I’m not quite sure how to answer this. It’s quite obvious that an author’s 
choices can cause pain for a user. For example, the author could have an option 
where if specified it silently deleted the entire filesystem of the user. This 
would be incredibly painful for the end user (assuming they didn’t want that of 
course).

Now a project is owned by the author, so they are allowed to choose to do 
things which cause pain for end users, and end users get to make a choice about 
whether it’s worth using that project even with the pain incurred from the 
author’s choices. The reason we don’t download checksummed external packages by 
default any more is because they *do* represent a choice that causes pain for 
end users and thus users should be aware they are making that choice.


Stefan Krah 


___ 
Distutils-SIG maillist - Distutils-SIG@python.org 
https://mail.python.org/mailman/listinfo/distutils-sig 


-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Donald Stufft
On July 24, 2014 at 7:26:11 AM, Richard Jones (r1chardj0...@gmail.com) wrote:
Even ignoring the malicious possibility there is a probably greater chance of 
accidental mistakes:

- company sets up internal index using pip's multi-index support and hosts 
various modules
- someone quite innocently uploads something with the same name, never version, 
to pypi
- company installs now use that unknown code

devpi avoids this (I would recommend it over multi-index for companies anyway) 
by having a white list system for packages that might be pulled from upstream 
that would clash with internal packages.

As Nick's mentioned, a signing infrastructure - tied to the index registration 
of a name - could solve this problem.
Yes, those are two solutions, another solution is for PyPI to allow registering 
a namespace, like dstufft.* and companies simply name all their packages that. 
This isn’t a unique problem to this PEP though. This problem exists anytime a 
company has an internal package that they do not want on PyPI. It’s unlikely 
that any of those companies are using the external link feature if that package 
is internal.



There still remains the usability issue of unsophisticated users running into 
external indexes and needing to cope with that in one of a myriad of ways as 
evidenced by the PEP. One solution proposed and refined at the EuroPython 
gathering today has PyPI caching packages from external indexes *for packages 
registered with PyPI*. That is: a requirement of registering your package (and 
external index URL) with PyPI is that you grant PyPI permission to cache 
packages from your index in the central index - a scenario that is ideal for 
users. Organisations not wishing to do that understand that they're the ones 
causing the pain for users.
We can’t cache the packages which aren’t currently hosted on PyPI. Not in an 
automatic fashion anyways. We’d need to ensure that their license allows us to 
do so. The PyPI ToS ensures this when they upload but if they never upload then 
they’ve never agreed to the ToS for that artifact.



An extension of this proposal is quite elegant; to reduce the pain of migration 
from the current approach to the new, we implement that caching right now, 
using the current simple index scraping. This ensures the packages are 
available to all clients throughout the transition period.
As said above, we can’t legally do this automatically, we’d need to ensure that 
there is a license that grants us distribution rights.



The transition issue was enough for those at the meeting today to urge me to 
reject the PEP.
To be clear, there are really three issues at play:

1) Should we continue to support scraping external urls *at all*. This is a 
cause of a lot of problems in pip and it infects our architecture with things 
that cause confusing error messages that we cannot really get away from. It’s 
also super slow and grossly insecure. 

2) Should we continue to support direct links from a project’s /simple/ page to 
a downloadable file which isn’t hosted on PyPI. 

3) If we allow direct links to a downloadable file from a project’s /simple/ 
page, do we mandate that they include a hash (and thus are safe) or do we also 
allow ones without a checksum (and thus are unsafe).

For me, 1 is absolutely not. It is terrible and it is the cause of horrible UX 
issues as well as performance issues. However 1 is also the majorly useful one. 
Eliminating 1 eliminates PIL and that is > 90% of the /simple/ traffic for the 
projects which this will have any impact.

For me 2 is a question of, is the relatively small (both traffic and number of 
packages) worth the extra cognitive overhead of users having to understand that 
there are *two* ways for something to be installed from not PyPI. Additionally 
is it worth the removal of ability for people to legally mirror the actual 
*files* without manually white listing the ones that they’ve vetted and found 
the license to allow them to do so (and even then in the future a project could 
switch to a license which doesn’t allow that). For me this is again no, it’s 
not worth it. Additional concepts to learn with their own quirks and causing 
pain for people wanting to mirror their installs is not worth keeping things 
working for a tiny fraction of things.

For me 3 is no just because 2 is no, but assuming 2 is “yes”, I still think 3 
is no because the external vs unverified split is confusing to users. 
Additionally the impact of this one, if I recall correctly, is almost zero.




      Richard


On 24 July 2014 12:40, Vladimir Diaz  wrote:
In metadata 2.0 even with package signing you end up where I can have you 
install “django-foobar” which depends on “FakeDjango”, which provides “Django”, 
and then for all intents and purposes you have a “Django” package installed.

Can you go into more detail?  Particularly, the part where "FakeDjango" 
provides Django.

Richard Jones mentions the case where an external index provides an "updat

Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Donald Stufft
On July 24, 2014 at 6:40:47 AM, Vladimir Diaz (vladimir.v.d...@gmail.com) wrote:
In metadata 2.0 even with package signing you end up where I can have you 
install “django-foobar” which depends on “FakeDjango”, which provides “Django”, 
and then for all intents and purposes you have a “Django” package installed.

Can you go into more detail?  Particularly, the part where "FakeDjango" 
provides Django.

Richard Jones mentions the case where an external index provides an "updated 
release" and tricks the updater into installing a compromised "Django."  Is 
this the same thing?


No it’s not the same thing. Metadata 2.0 provides mechanisms for one package to 
claim to be another package. This only takes affect once that package has been 
installed though. This functionality allows things like a package to provide a 
compatible shim that uses different internal guts, or for one package to 
obsolete another or even for multiple packages to “provide” the same thing and 
allow the user to select which one they want to use at install time.

-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Donald Stufft
On July 24, 2014 at 4:55:42 AM, Richard Jones (r1chardj0...@gmail.com) wrote:
Thanks for responding, even from your sick bed.

This message about users having to view and understand /simple/ indexes is 
repeated many times. I didn't have to do that in the case of PIL. The tool told 
me "use --allow-external PIL to allow" and then when that failed it told me 
"use --allow-unverified PIL to allow". There was no needing to understand why, 
nor any reading of /simple/ indexes.
Currently most users (I'm thinking of people who install PIL once or twice) 
don't need to edit configuration files, and with a modification we could make 
the above process interactive. Those ~3000 packages that have internal and 
external packages would be slow, yes.
They need to do it to understand if a link is internal, external, or 
unverified. The feedback *i’ve* gotten is complete confusion about the 
difference between them. Even making that process interactive still means that 
pip cannot hard fail on a failure to retrieve an URL and thus must present 
confusing error messages in the case an URL is temporarily down.



This PEP proposes a potentially confusing break for both users and packagers. 
In particular, during the transition there will be packages which just 
disappear as far as users are concerned. In those cases users will indeed need 
to learn that there is a /simple/ page and they will need to view it in order 
to find the URL to add to their installation invocation in some manner. Even 
once install tools start supporting the new mechanism, users who lag (which as 
we all know are the vast majority) will run into this.
So we lengthen the transition time, gate it on an installer that has the 
automatic hinting becoming the dominant version. We can pretty easily see 
exactly what version of the tooling is being used to install stuff from PyPI.



On the devpi front: indeed it doesn't use the mirroring protocol because it is 
not a mirror. It is a caching proxy that uses the same protocols as the install 
tools to obtain, and then cache the files for install. Those files are then 
presented in a single index for the user to use. There is no need for 
multi-index support, even in the case of having multiple staging indexes. There 
is a need for devpi to be able to behave just like an installer without needing 
intervention, which I believe will be possible in this proposal as it can 
automatically add external indexes as it needs to.
Yes, devpi should be able to update itself to add the external indexes.



I talked to a number of people last night and I believe the package spoofing 
concept is also a vulnerability in the Linux multi-index model (where an 
external index provides an "updated release" of some core package like libssl 
on Linux, or perhaps requests in Python land). As I understand it, there is no 
protection against this. Happy to be told why I'm wrong, of course :)
It’s not really a “vulnerability”, tt’s something that is able to be done 
regardless and thus package authors are not part of the thread model. If I’m 
installing a package from a malicious author I’m executing arbitrary Python 
from them. They can drop a .egg-info into site-packages and spoof a package 
that way. It is completely impossible to remove the ability for a package 
author of a package that someone else is installing from spoofing another 
package. The spoofing problem is a red herring, it’s like saying that your 
browser vendor could get your bank password because you’re typing it into the 
browser. Well yes they could, but it’s a mandatory thing. If you’re installing 
a package a wrote you must extend trust to me.




      Richard




-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Stefan Krah
Richard Jones  wrote:
> There still remains the usability issue of unsophisticated users running into
> external indexes and needing to cope with that in one of a myriad of ways as
> evidenced by the PEP. One solution proposed and refined at the EuroPython
> gathering today has PyPI caching packages from external indexes *for packages
> registered with PyPI*. That is: a requirement of registering your package (and
> external index URL) with PyPI is that you grant PyPI permission to cache
> packages from your index in the central index - a scenario that is ideal for
> users.

-1. That is unlikely to solve the draconian-terms-and-conditions problem
and one reason to host externally is to get your own download statistics.


> Organisations not wishing to do that understand that they're the ones
> causing the pain for users.

No. First, checksummed external packages could be downloaded without asking
at all.  Second, if international authors are required to study US export law
before uploading, I wonder who is causing the pain.

Finally, how can an author cause pain for users? Without him, the work
would not be there in the first place.


Stefan Krah


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Richard Jones
Even ignoring the malicious possibility there is a probably greater chance
of accidental mistakes:

- company sets up internal index using pip's multi-index support and hosts
various modules
- someone quite innocently uploads something with the same name, never
version, to pypi
- company installs now use that unknown code

devpi avoids this (I would recommend it over multi-index for companies
anyway) by having a white list system for packages that might be pulled
from upstream that would clash with internal packages.

As Nick's mentioned, a signing infrastructure - tied to the index
registration of a name - could solve this problem.

There still remains the usability issue of unsophisticated users running
into external indexes and needing to cope with that in one of a myriad of
ways as evidenced by the PEP. One solution proposed and refined at the
EuroPython gathering today has PyPI caching packages from external indexes
*for packages registered with PyPI*. That is: a requirement of registering
your package (and external index URL) with PyPI is that you grant PyPI
permission to cache packages from your index in the central index - a
scenario that is ideal for users. Organisations not wishing to do that
understand that they're the ones causing the pain for users.

An extension of this proposal is quite elegant; to reduce the pain of
migration from the current approach to the new, we implement that caching
right now, using the current simple index scraping. This ensures the
packages are available to all clients throughout the transition period.

The transition issue was enough for those at the meeting today to urge me
to reject the PEP.


  Richard


On 24 July 2014 12:40, Vladimir Diaz  wrote:

> In metadata 2.0 even with package signing you end up where I can have you
> install “django-foobar” which depends on “FakeDjango”, which provides
> “Django”, and then for all intents and purposes you have a “Django” package
> installed.
>
> Can you go into more detail?  Particularly, the part where "FakeDjango"
> provides Django.
>
> Richard Jones mentions the case where an external index provides an
> "updated release" and tricks the updater into installing a compromised
> "Django."  Is this the same thing?
>
>
> On Thu, Jul 24, 2014 at 4:55 AM, Richard Jones 
> wrote:
>
>> Thanks for responding, even from your sick bed.
>>
>> This message about users having to view and understand /simple/ indexes
>> is repeated many times. I didn't have to do that in the case of PIL. The
>> tool told me "use --allow-external PIL to allow" and then when that failed
>> it told me "use --allow-unverified PIL to allow". There was no needing to
>> understand why, nor any reading of /simple/ indexes.
>> Currently most users (I'm thinking of people who install PIL once or
>> twice) don't need to edit configuration files, and with a modification we
>> could make the above process interactive. Those ~3000 packages that have
>> internal and external packages would be slow, yes.
>>
>> This PEP proposes a potentially confusing break for both users and
>> packagers. In particular, during the transition there will be packages
>> which just disappear as far as users are concerned. In those cases users
>> will indeed need to learn that there is a /simple/ page and they will need
>> to view it in order to find the URL to add to their installation invocation
>> in some manner. Even once install tools start supporting the new mechanism,
>> users who lag (which as we all know are the vast majority) will run into
>> this.
>>
>> On the devpi front: indeed it doesn't use the mirroring protocol because
>> it is not a mirror. It is a caching proxy that uses the same protocols as
>> the install tools to obtain, and then cache the files for install. Those
>> files are then presented in a single index for the user to use. There is no
>> need for multi-index support, even in the case of having multiple staging
>> indexes. There is a need for devpi to be able to behave just like an
>> installer without needing intervention, which I believe will be possible in
>> this proposal as it can automatically add external indexes as it needs to.
>>
>> I talked to a number of people last night and I believe the package
>> spoofing concept is also a vulnerability in the Linux multi-index model
>> (where an external index provides an "updated release" of some core package
>> like libssl on Linux, or perhaps requests in Python land). As I understand
>> it, there is no protection against this. Happy to be told why I'm wrong, of
>> course :)
>>
>>
>>   Richard
>>
>> ___
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Vladimir Diaz
In metadata 2.0 even with package signing you end up where I can have you
install “django-foobar” which depends on “FakeDjango”, which provides
“Django”, and then for all intents and purposes you have a “Django” package
installed.

Can you go into more detail?  Particularly, the part where "FakeDjango"
provides Django.

Richard Jones mentions the case where an external index provides an
"updated release" and tricks the updater into installing a compromised
"Django."  Is this the same thing?


On Thu, Jul 24, 2014 at 4:55 AM, Richard Jones 
wrote:

> Thanks for responding, even from your sick bed.
>
> This message about users having to view and understand /simple/ indexes is
> repeated many times. I didn't have to do that in the case of PIL. The tool
> told me "use --allow-external PIL to allow" and then when that failed it
> told me "use --allow-unverified PIL to allow". There was no needing to
> understand why, nor any reading of /simple/ indexes.
> Currently most users (I'm thinking of people who install PIL once or
> twice) don't need to edit configuration files, and with a modification we
> could make the above process interactive. Those ~3000 packages that have
> internal and external packages would be slow, yes.
>
> This PEP proposes a potentially confusing break for both users and
> packagers. In particular, during the transition there will be packages
> which just disappear as far as users are concerned. In those cases users
> will indeed need to learn that there is a /simple/ page and they will need
> to view it in order to find the URL to add to their installation invocation
> in some manner. Even once install tools start supporting the new mechanism,
> users who lag (which as we all know are the vast majority) will run into
> this.
>
> On the devpi front: indeed it doesn't use the mirroring protocol because
> it is not a mirror. It is a caching proxy that uses the same protocols as
> the install tools to obtain, and then cache the files for install. Those
> files are then presented in a single index for the user to use. There is no
> need for multi-index support, even in the case of having multiple staging
> indexes. There is a need for devpi to be able to behave just like an
> installer without needing intervention, which I believe will be possible in
> this proposal as it can automatically add external indexes as it needs to.
>
> I talked to a number of people last night and I believe the package
> spoofing concept is also a vulnerability in the Linux multi-index model
> (where an external index provides an "updated release" of some core package
> like libssl on Linux, or perhaps requests in Python land). As I understand
> it, there is no protection against this. Happy to be told why I'm wrong, of
> course :)
>
>
>   Richard
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Richard Jones
Thanks for responding, even from your sick bed.

This message about users having to view and understand /simple/ indexes is
repeated many times. I didn't have to do that in the case of PIL. The tool
told me "use --allow-external PIL to allow" and then when that failed it
told me "use --allow-unverified PIL to allow". There was no needing to
understand why, nor any reading of /simple/ indexes.
Currently most users (I'm thinking of people who install PIL once or twice)
don't need to edit configuration files, and with a modification we could
make the above process interactive. Those ~3000 packages that have internal
and external packages would be slow, yes.

This PEP proposes a potentially confusing break for both users and
packagers. In particular, during the transition there will be packages
which just disappear as far as users are concerned. In those cases users
will indeed need to learn that there is a /simple/ page and they will need
to view it in order to find the URL to add to their installation invocation
in some manner. Even once install tools start supporting the new mechanism,
users who lag (which as we all know are the vast majority) will run into
this.

On the devpi front: indeed it doesn't use the mirroring protocol because it
is not a mirror. It is a caching proxy that uses the same protocols as the
install tools to obtain, and then cache the files for install. Those files
are then presented in a single index for the user to use. There is no need
for multi-index support, even in the case of having multiple staging
indexes. There is a need for devpi to be able to behave just like an
installer without needing intervention, which I believe will be possible in
this proposal as it can automatically add external indexes as it needs to.

I talked to a number of people last night and I believe the package
spoofing concept is also a vulnerability in the Linux multi-index model
(where an external index provides an "updated release" of some core package
like libssl on Linux, or perhaps requests in Python land). As I understand
it, there is no protection against this. Happy to be told why I'm wrong, of
course :)


  Richard
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-23 Thread Donald Stufft
On July 23, 2014 at 6:27:31 PM, Nick Coghlan (ncogh...@gmail.com) wrote:
a) For private indexes, being able to override upstream is a feature, not a bug
b) Categorically preventing spoofing is what end-to-end signing is for

I forgot to mention, that you basically need to trust the maintainers of the 
packages you choose to install anyways. Even if we don’t use multi index it’s 
trivial for a package to masquerade as another one. In metadata 2.0 even with 
package signing you end up where I can have you install “django-foobar” which 
depends on “FakeDjango”, which provides “Django”, and then for all intents and 
purposes you have a “Django” package installed.

The point being we can’t rely on the index ACLs to protect a user who has 
elected to install something that does something bad. The authors of a package 
that the user has opted to install *are not* in the threat model.

-- 
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-23 Thread Nick Coghlan
On 24 Jul 2014 03:09, "Richard Jones"  wrote:
>
> I believe the current PEP addresses the significant usability issues
around this by swapping them for other usability issues. In fact, I believe
it will make matters worse with potential confusion about which index hosts
what, potential masking of release files or even, in the worst scenario,
potential spoofing of release files by indexes out of the control of
project owners.

Donald covered most points I would have made in his reply, but I do have a
couple of additions specifically on this point:

a) For private indexes, being able to override upstream is a feature, not a
bug
b) Categorically preventing spoofing is what end-to-end signing is for

pip's own existing multiple index support is what makes devpi and its
concept not only of private indexes, but also separate dev, staging and
production indexes, possible.

PEP 470 proposes to make some small enhancements to the multiple index
support in order to allow subsequent deprecation and removal of the
complicated and largely redundant link spidering system.

>From a usability perspective, Ubuntu PPAs (Personal Package Archives, where
users can easily host custom repos on Launchpad) have proved enormously
popular, and Fedora has now adopted a similar model with it's COPR RPM
building and yum repo hosting service. conda also uses channel selection as
a way of determining what packages are available.

Cheers,
Nick.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 470 discussion, part 3

2014-07-23 Thread Donald Stufft
On July 23, 2014 at 1:09:00 PM, Richard Jones (r1chardj0...@gmail.com) wrote:
I have been mulling over PEP 470 for some time without having the time to truly 
dedicate to addressing it. I believe I'm up to date with its contents and the 
(quite significant, and detailed) discussion around it.

To summarise my understanding, PEP 470 proposes to remove the current link 
spidering (pypi-scrape, pypi-scrape-crawl) while retaining explicit hosting 
(pypi-explicit). I believe it retains the explicit links to external hosting 
provided by pypi-explicit.
No, it removes pypi-explicit as well, leaving only files hosted on PyPI. On top 
of that it adds a new functionality where project authors can indicate that 
their files are hosted on a non PyPI index. This allows tooling to indicate to 
users that they need to add additional indexes to their install commands in 
order to install something, as well as allowing PyPI to still act as a central 
authority for naming without forcing people to upload to PyPI.



The reason given for this change is the current bad user experience around the 
--allow-external and --allow-unverified options to pip install. That is, that 
users currently attempt to install a non-pypi-explicit package and the result 
is an obscure error message.
That’s part of the bad UX, the other part is that users are not particularly 
aware of the difference between an external vs an unverified link (in fact many 
people involved in packaging were not aware until it was explained to them by 
me, the difference is subtle). Part of the problem is while it’s easy for 
*tooling* to determine the difference between external and unverified, for a 
human it requires inspecting the actual HTML of the /simple/ page.



I believe the current PEP addresses the significant usability issues around 
this by swapping them for other usability issues. In fact, I believe it will 
make matters worse with potential confusion about which index hosts what, 
potential masking of release files or even, in the worst scenario, potential 
spoofing of release files by indexes out of the control of project owners.
So that’s a potential problem with any multi index thing yes. However I do not 
believe they are serious problems. It is a model that is in use by every linux 
vendor ever and anyone who has ever used a Linux (or most of the various BSDs) 
are already familiar with it. On top of that it is something that end users 
would need to be aware of if they want to use a private index, or they want to 
install commercial software that has a restricted index, or any other number of 
situations. In other words multiple indexes don’t go away, they will always be 
there. The effect of PEP 438 is that users need to be aware of *two* different 
ways of installing things not hosted on PyPI instead of just one. 

This two concepts instead of one is another part of the bad UX inflicted by PEP 
438. The zen states that there should be one way to do something, and I think 
that is a good thing to strive for. 



I would like us to consider instead working on the usability of the existing 
workflow, by rather than throwing an error, we start a dialog with the user:

$ pip install PIL
Downloading/unpacking PIL
  PIL is hosted externally to PyPI. Do you still wish to download it? [Y/n] y
  PIL has no checksum. Are you sure you wish to download it? [Y/n] y
Downloading/unpacking PIL
  Downloading PIL-1.1.7.tar.gz (506kB): 506kB downloaded
...

Obviously this would require scraping the site, but given that this interaction 
would only happen for a very small fraction of projects (those for which no 
download is located), the overall performance hit is negligible. The PEP 
currently states that this would be a "massive performance hit" for reasons I 
don't understand.
It’s a big performance hit because we can’t just assume that if there is a 
download located on PyPI that there is not a better download hosted externally. 
So in order to actually do this accurately then we must scan any URL we locate 
in order to build up an entire list of all the potential files, and then ask if 
the person wants to download it.

For a sort of indication of the difference, I can scan all of PyPI looking for 
potential release files in about 20 minutes if I restrict myself to only things 
hosted directly on PyPI. If I include the additional scanning then that time 
jumps up to 3-4 hours. That’s what, 13x slower? And that’s with an incredibly 
aggressive timeout and a blacklist to only try bad hosts once.



The two prompts could be made automatic "y" responses for tools using the 
existing --allow-external and --allow-unverified flags.

I also note that PEP 470 says "PEP 438 proposed a system of classifying file 
links as either internal, external, or unsafe", whereas PEP 438 has no mention 
of "unsafe". This leads "unsafe" to never actually be defined anywhere that I 
can see.
I can define them in the PEP, but basically:

* internal - Things hosted by PyPI itself.


[Distutils] PEP 470 discussion, part 3

2014-07-23 Thread Richard Jones
I have been mulling over PEP 470 for some time without having the time to
truly dedicate to addressing it. I believe I'm up to date with its contents
and the (quite significant, and detailed) discussion around it.

To summarise my understanding, PEP 470 proposes to remove the current link
spidering (pypi-scrape, pypi-scrape-crawl) while retaining explicit hosting
(pypi-explicit). I believe it retains the explicit links to external
hosting provided by pypi-explicit.

The reason given for this change is the current bad user experience around
the --allow-external and --allow-unverified options to pip install. That
is, that users currently attempt to install a non-pypi-explicit package and
the result is an obscure error message.

I believe the current PEP addresses the significant usability issues around
this by swapping them for other usability issues. In fact, I believe it
will make matters worse with potential confusion about which index hosts
what, potential masking of release files or even, in the worst scenario,
potential spoofing of release files by indexes out of the control of
project owners.

I would like us to consider instead working on the usability of the
existing workflow, by rather than throwing an error, we start a dialog with
the user:

$ pip install PIL
Downloading/unpacking PIL
  PIL is hosted externally to PyPI. Do you still wish to download it? [Y/n]
y
  PIL has no checksum. Are you sure you wish to download it? [Y/n] y
Downloading/unpacking PIL
  Downloading PIL-1.1.7.tar.gz (506kB): 506kB downloaded
...

Obviously this would require scraping the site, but given that this
interaction would only happen for a very small fraction of projects (those
for which no download is located), the overall performance hit is
negligible. The PEP currently states that this would be a "massive
performance hit" for reasons I don't understand.

The two prompts could be made automatic "y" responses for tools using the
existing --allow-external and --allow-unverified flags.

I also note that PEP 470 says "PEP 438 proposed a system of classifying
file links as either internal, external, or unsafe", whereas PEP 438 has no
mention of "unsafe". This leads "unsafe" to never actually be defined
anywhere that I can see.

Finally, the Rejected Proposals section of the PEP appears to have a couple
of justifications for rejection which have nothing whatsoever to do with
the Rationale ("PyPI is fronted by a globally distributed CDN...", "PyPI
supports mirroring...") As Holger has already indicated, that second one is
going to have a heck of a time dealing with PEP 470 changes at least in the
devpi case.

 "PyPI has monitoring and an on-call rotation of sysadmins..." would be
solved through improving the failure message reported to the user as
discussed above.


 Richard
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig