Re: [Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Nick Coghlan Fri, 02 Jan 2015 00:22:03 -0800

On 2 January 2015 at 16:38, Donald Stufft <don...@stufft.io> wrote:

>
> On Jan 2, 2015, at 1:33 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
>
> That's the part I meant - the signing of developer keys to delegate trust
> to them without needing to trust the integrity of the online PyPI service.
>
> Hence the idea of instead keeping PyPI as an entirely online service
> (without any offline delegation of authority), and suggesting that
> developers keep their *own* separately signed metadata, which can then be
> compared against the PyPI published metadata (both by the developers
> themselves and by third parties). Discrepancies becoming a trigger for
> further investigation, which may include suspending the PyPI service if the
> the discrepancy is reported by an individual or organisation that the PyPI
> administrators trust.
>
>
> I’m confused what you mean by “without needing to the trust the integrity
> of the online PyPI service”.
>
> Developer keys get signed by offline keys controlled by I’m guessing
> either myself or Richard or both. The only time we’re depending on the
> integrity of the machine that runs PyPI and not on an offline key possessed
> by someone is during the window of time when a new project has been created
> (the project itself, not a release of a project) and the next time the
> delegations get signed by the offline keys.
>


Yes, as I said, that's the part I mean. To avoid trusting the integrity of
the online PyPI service, while still using PyPI as a trust root for the
purpose of software installation, we would need to define a system whereby:

1. The PyPI administrators have a set of offline keys
2. Developer are able to supply keys to the PyPI administrators for trust
delegation
3. This system has sufficiently low barriers to entry that developers are
actually willing to use it
4. This system is compatible with a PyPI run build service

We already have a hard security problem to solve (running PyPI), so adding
a *second* hard security problem (running what would in effect be a CA)
doesn't seem like a good approach to risk mitigation to me.

My proposal is that we instead avoid the hard problem of running a CA
entirely by advising *developers* to monitor PyPI's integrity by ensuring
that what PyPI is publishing matches what they released. That is, we split
the end-to-end data integrity validation problem in two and solve each part
separately:

* use PEP 458 to cover the PyPI -> end user link, with the end users
treating PyPI as a trusted authority. End users will be able to detect
tampering with the link between them and PyPI, but if the online PyPI
service gets compromised, *end users won't detect it*.
* use a separate metadata validation process to check that PyPI is
publishing the right thing, covering both the developer -> PyPI link *and*
the integrity of the PyPI service itself.

The metadata validation potentially wouldn't even need to use TUF -
developers could simply upload the expected hash of the artifacts they
published, and the metadata validation service would check that the signed
artifacts from PyPI match those hashes. The core of the idea is simply that
there be a separate service (or services) which PyPI can't update, but
developers uploading packages *can*.

By focusing on detection & recovery, rather than prevention, we can
drastically reduce the complexity of the problem to be solved, while still
mitigating the major risks we care about. The potential attacks that worry
me are the ones that result in silent substitution of artifacts - when it
comes to denial of service attacks, there's little reason to mess about
with inducing metadata validation failures when there are already far
simpler options available.

Redistributors may decide to take advantage of the developer metadata
validation support to do our own verification of source downloads, but I
don't believe upstream needs to worry about that too much - if developers
have the means to verify automatically that what PyPI is currently
publishing matches what they released, then the redistributor side of
things should take care of itself.

Another way of viewing the problem is that instead of thinking of the scope
of PEP 480 as PyPI delegating trust to developers, we can instead think of
it as developers delegating trust to PyPI. PyPI then becomes a choke point
in a network graph, rather than the root of a tree. My core idea stays the
same regardless of how you look at it though: we *don't* try to solve the
problem of letting end users establish in a single step that what they
downloaded matches what the developer published. Instead, we aim to provide
answers to the questions:

* Did I just download what PyPI is currently publishing?
* Is PyPI currently publishing what the developer of <project> released?

There's no fundamental requirement that those two questions be answered by
the *same* security system - we have the option of splitting them, and I'm
starting to think that the overall UX will be better if we do.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Reply via email to