when do things really need to be revoked? who decides?

Mike Shaver Mon, 20 May 2024 17:46:01 -0700

DELAYED REVOCATION IS TOO COMMON

This is long enough, so I’ll spare readers dozens of links to
delayed-revocation incidents collected in Bugzilla; we all know that pretty
much any other incident that involves misissuance will come with a
delayed-revocation chaser these days.

In *many* of those delrev (?)incidents, we see a phrase like “we requested
that our subscribers revoke and reissue”. They are not informing their
subscribers as to a fixed revocation timeline, but rather simply asking if
those subscribers if they would please do the revocation process when
they’re able. In one case, I heard of a revocation request from a major CA
that didn’t even have a timeline *suggested*. Of course, the subscriber
gets no value out of replacing their certs: it’s pure overhead, and if
WebPKI were operated perfectly, it would never be necessary. This is an
externality of, most often, a CA’s failure to sufficiently invest in
understanding, implementing, and verifying the processes that they use to
twirl the keys to the entire web’s security.

Indeed in a number of cases the CAs didn’t even stop issuing once they
realized that they were misissuing certs! Intentionally issuing certs that
are known to be bad, what a world.

While CAs generally claim that they would be able to handle a mass
revocation incident (such as due to leaked key material), the evidence we
have for CAs aggressively revoking as called for by the BRs and the root
programs is…scant. We’ve seen “it was a long weekend” as a reason for
delaying revocation for certs—including some used by a different part of
the CA’s company! One CA has proposed a “global fire drill” to stress-test
revocation procedures, but we’re seeing revocation timelines reaching
multiple months right now, so…a lot of stuff would end up burning in that
fire.

CAs also tell us that they advocate and recommend for their subscribers to
implement automation for cert management, but we never see any concrete
targets or success criteria for those efforts, so they certainly seem to me
to just be more “asking nicely”. (I’m not sure that all of the CAs claiming
to be pushing for subscriber automation actually have robust ACME or
similar support yet, in fact.)

(Some of the CAs made explicit promises years ago to not delay revocation,
some of them issued even though they knew that zlint showed an error—there
are lots of additional twists on simply “issuing bad certs and not cleaning
them up as agreed”.)

Now, in the wake of these *many* delrev incidents, over years of history,
the root programs have responded with pretty much no consequences
whatsoever as far as I can tell. There’s one case open about Entrust’s
overall behaviour, who are certainly over-achieving when it comes to ways
to get location fields wrong, but they are definitely not the only ones who
treat the BRs’ 1/5-day revocation instruction as instead meaning “when it’s
convenient for the customer”.

THE QUESTION

So: what should be done to make revocations of misissued
certificates—sometimes *intentionally* misissued certificates—as prompt as
the BRs require?

The cost equation for CAs is obviously skewed against the health the web
PKI, if we are to believe that the BRs are important. Once a CA has
violated the BRs and misissued, it is *in their commercial interest* to not
revoke promptly: it causes embarrassment and subscriber frustration, or
even disruption to subscriber services. At the limit it might even lead a
subscriber to change CAs if the reissuance events are frequent and
disruptive enough.

On the other hand, the more bad certs there are floating around, even if
it’s “only” a matter of a case mismatch, the less interoperable the web PKI
is, and the harder it is for a relying party to make effective use of
WebPKI’s guarantees. Let’s please not end up with a “quirks mode” for TLS
certificate handling!

SOME OPTIONS

One option: decide that there really are some BR violations that “don’t
matter”, such that revocation can happen on a more relaxed, accommodating
timeline—or perhaps not at all, just letting them expire as has been seen
in some delrev incidents already. This would mean that we would still see
incident reports that in theory help other CAs learn to put the postal code
in the right field or similar, but subscribers and CAs and root programs
would have to do less work.

Another option: have affected certificates added to OneCRL after 72 hours.
It would benefit from some automation, but it’s probably feasible to make
relatively smooth. It is sometimes the case, worryingly, that it takes CAs
a fair bit of time and multiple attempts to find all the affected
certificates, so this might require some linter running off CT logs or
similar as a watchdog.

Another another option: forbid CAs from selling WebPKI certificates into
environments where a) revocation within a 5-day limit is operationally
infeasible, and b) disruption of the related services would cause risk to
human health and safety or similar. There are apparently many organizations
out there which are critical to national economies or whatever, but need
literal Earth months to replace a certificate. These are clearly cases
where the requirements of WebPKI are incompatible with the operational
constraints of the subscriber, so it’s not a good idea to mix them. (I’m
sure some CAs could offer help with private PKI systems, probably with
compelling margins.)

Yet another, this time somewhat more preventative: if a CA repeatedly
demonstrates that they are unable or (always the case?) unwilling to honour
their commitments to the BRs, impose validity length restrictions on certs
that they issue. At least in that case future misissued certificates would
be in the wild for longer, and it would also show nicely that CAs’ advocacy
for certificate automation was fruitful. Ignoring Entrust’s diatribe
against 90-day validity periods in that weird blog post, I don’t think that
any CA has made a credible case that their customers would not be able to
handle rotating certificates every 90 days, even if they have to carve the
new fingerprint into a mountain using a toothbrush or whatever. They’d even
know it’s coming.

One more: make delayed revocation incidents, specifically, more visible to
subscribers and potential subscribers, and see if business pressure does
what merely “agreeing legally to follow the BRs” (and optionally making
empty “it’ll never happen again” promises) has been unable to do in too
many cases.

THANKS FOR READING

I think the WebPKI is being poorly served by the *realities* of certificate
integrity and misissuance responses. If nothing else, it’s causing a ton of
delrev incidents for Ben to have to shepherd, without even module peers to
assist him.

Something needs to change.

--
You received this message because you are subscribed to the Google Groups
"dev-security-policy@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dev-security-policy+unsubscr...@mozilla.org.
To view this discussion on the web visit
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CADQzZqtRffg96jW1vdiPkVq96vdS1S1L7V9MaD-8ns9xm%3DZ1ew%40mail.gmail.com.

when do things really need to be revoked? who decides?

Reply via email to