Re: [Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Donald Stufft Fri, 02 Jan 2015 00:58:38 -0800

> On Jan 2, 2015, at 3:21 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
> 
> On 2 January 2015 at 16:38, Donald Stufft <don...@stufft.io 
> <mailto:don...@stufft.io>> wrote:
> 
>> On Jan 2, 2015, at 1:33 AM, Nick Coghlan <ncogh...@gmail.com 
>> <mailto:ncogh...@gmail.com>> wrote:
>> 
>> That's the part I meant - the signing of developer keys to delegate trust to 
>> them without needing to trust the integrity of the online PyPI service.
>> 
>> Hence the idea of instead keeping PyPI as an entirely online service 
>> (without any offline delegation of authority), and suggesting that 
>> developers keep their *own* separately signed metadata, which can then be 
>> compared against the PyPI published metadata (both by the developers 
>> themselves and by third parties). Discrepancies becoming a trigger for 
>> further investigation, which may include suspending the PyPI service if the 
>> the discrepancy is reported by an individual or organisation that the PyPI 
>> administrators trust.
> 
> I’m confused what you mean by “without needing to the trust the integrity of 
> the online PyPI service”.
> 
> Developer keys get signed by offline keys controlled by I’m guessing either 
> myself or Richard or both. The only time we’re depending on the integrity of 
> the machine that runs PyPI and not on an offline key possessed by someone is 
> during the window of time when a new project has been created (the project 
> itself, not a release of a project) and the next time the delegations get 
> signed by the offline keys.
> 
> Yes, as I said, that's the part I mean. To avoid trusting the integrity of 
> the online PyPI service, while still using PyPI as a trust root for the 
> purpose of software installation, we would need to define a system whereby:
> 
> 1. The PyPI administrators have a set of offline keys
> 2. Developer are able to supply keys to the PyPI administrators for trust 
> delegation
> 3. This system has sufficiently low barriers to entry that developers are 
> actually willing to use it
> 4. This system is compatible with a PyPI run build service
> 
> We already have a hard security problem to solve (running PyPI), so adding a 
> *second* hard security problem (running what would in effect be a CA) doesn't 
> seem like a good approach to risk mitigation to me.
> 
> My proposal is that we instead avoid the hard problem of running a CA 
> entirely by advising *developers* to monitor PyPI's integrity by ensuring 
> that what PyPI is publishing matches what they released. That is, we split 
> the end-to-end data integrity validation problem in two and solve each part 
> separately:
> 
> * use PEP 458 to cover the PyPI -> end user link, with the end users treating 
> PyPI as a trusted authority. End users will be able to detect tampering with 
> the link between them and PyPI, but if the online PyPI service gets 
> compromised, *end users won't detect it*.
> * use a separate metadata validation process to check that PyPI is publishing 
> the right thing, covering both the developer -> PyPI link *and* the integrity 
> of the PyPI service itself.
> 
> The metadata validation potentially wouldn't even need to use TUF - 
> developers could simply upload the expected hash of the artifacts they 
> published, and the metadata validation service would check that the signed 
> artifacts from PyPI match those hashes. The core of the idea is simply that 
> there be a separate service (or services) which PyPI can't update, but 
> developers uploading packages *can*.
> 
> By focusing on detection & recovery, rather than prevention, we can 
> drastically reduce the complexity of the problem to be solved, while still 
> mitigating the major risks we care about. The potential attacks that worry me 
> are the ones that result in silent substitution of artifacts - when it comes 
> to denial of service attacks, there's little reason to mess about with 
> inducing metadata validation failures when there are already far simpler 
> options available.
> 
> Redistributors may decide to take advantage of the developer metadata 
> validation support to do our own verification of source downloads, but I 
> don't believe upstream needs to worry about that too much - if developers 
> have the means to verify automatically that what PyPI is currently publishing 
> matches what they released, then the redistributor side of things should take 
> care of itself.
> 
> Another way of viewing the problem is that instead of thinking of the scope 
> of PEP 480 as PyPI delegating trust to developers, we can instead think of it 
> as developers delegating trust to PyPI. PyPI then becomes a choke point in a 
> network graph, rather than the root of a tree. My core idea stays the same 
> regardless of how you look at it though: we *don't* try to solve the problem 
> of letting end users establish in a single step that what they downloaded 
> matches what the developer published. Instead, we aim to provide answers to 
> the questions:
> 
> * Did I just download what PyPI is currently publishing?
> * Is PyPI currently publishing what the developer of <project> released?
> 
> There's no fundamental requirement that those two questions be answered by 
> the *same* security system - we have the option of splitting them, and I'm 
> starting to think that the overall UX will be better if we do.
> 
> Regards,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncogh...@gmail.com <mailto:ncogh...@gmail.com>   |   
> Brisbane, Australia



Oh I see. I was just misreading what you meant by “without trusting the 
integrity of the online PyPI service”, I thought you meant it in a post PEP 480 
world, you meant it in a pre (or without) PEP 480 world.

So onto the actual thing that you’ve proposed!

I have concerns about the actual feasibility of doing such a thing, some of 
which are similar to my concerns with doing non-mandatory PEP 480.

* If uploading to a verifier service is optional then a significant portion of 
authors simply won’t do it and if you installing 100 things, and 99 of them are 
verified and 1 of them are not then there is an attack vector that I can use to 
compromise you undetected (since the author didn’t upload their verification 
somewhere else).
* It’s not actually less work in general, it just pushes the work from the PyPI 
administrators to the community. This can work well if the community is willing 
to step up! However, PyPI’s availability/speed problems were originally 
attempted to be solved by pushing the work to the community via the original 
mirror system and (not to downplay the people who did step up) the response was 
not particularly great and the mirrors got a few at first and as the shiny 
factor wore off people’s mirrors shutdown or stopped working or what have you.
* A number of the attacks that TUF protects against do not rely on the attacker 
creating malicious software packages, things only showing known insecure 
versions of a project so that they can then attack people through a known 
exploit. It’s not *wrong* to not protect against these (most systems don’t) but 
we’d want to explicitly decide that we’re not going to.

I’d note that PEP 480 and your proposal aren’t really mutually exclusive so 
there’s not really harm in *trying* yours and if it fails falling back to 
something like PEP 480 other than end user confusion if that gets shut down and 
the cost of actually developing/setting up that solution.

Overall I’m +1 on things that enable better detection of a compromise but I’m 
probably -0.5 or so on your specific proposal as I think that expecting 
developers to upload verification data to “verification servers” is just 
pushing work onto other people just so we don’t have to do it.

I also think your two questions are not exactly right, because all that means 
is that it becomes harder to attack *everyone* via a PyPI compromise, however 
it’s still trivial to attack specific people if you’ve compromised PyPI or the 
CDN since you can selectively serve maliciously signed packages depending on 
who is requesting them. To this end I don’t think a solution that pip doesn’t 
implement is actually going to prevent anything but very dumb attacks by an 
attacker who has already compromised the PyPI machines.

I think another issue here is that we’re effectively doing something similar to 
TLS except instead of domain names we have project names and that although 
there are *a lot* of people who really hate the CA system nobody has yet come 
up with an effective means of actually replacing it without regressing into 
worse security. The saving grace here is that we operate at a much smaller 
scale (one “DNS” root, one trust root, ~53k unique names vs… more than I feel 
like account) so it’s possible that solutions which don’t scale at TLS scale 
might scale at PyPI scale.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Reply via email to