DELAYED REVOCATION IS TOO COMMON This is long enough, so I’ll spare readers dozens of links to delayed-revocation incidents collected in Bugzilla; we all know that pretty much any other incident that involves misissuance will come with a delayed-revocation chaser these days.
In *many* of those delrev (?)incidents, we see a phrase like “we requested that our subscribers revoke and reissue”. They are not informing their subscribers as to a fixed revocation timeline, but rather simply asking if those subscribers if they would please do the revocation process when they’re able. In one case, I heard of a revocation request from a major CA that didn’t even have a timeline *suggested*. Of course, the subscriber gets no value out of replacing their certs: it’s pure overhead, and if WebPKI were operated perfectly, it would never be necessary. This is an externality of, most often, a CA’s failure to sufficiently invest in understanding, implementing, and verifying the processes that they use to twirl the keys to the entire web’s security. Indeed in a number of cases the CAs didn’t even stop issuing once they realized that they were misissuing certs! Intentionally issuing certs that are known to be bad, what a world. While CAs generally claim that they would be able to handle a mass revocation incident (such as due to leaked key material), the evidence we have for CAs aggressively revoking as called for by the BRs and the root programs is…scant. We’ve seen “it was a long weekend” as a reason for delaying revocation for certs—including some used by a different part of the CA’s company! One CA has proposed a “global fire drill” to stress-test revocation procedures, but we’re seeing revocation timelines reaching multiple months right now, so…a lot of stuff would end up burning in that fire. CAs also tell us that they advocate and recommend for their subscribers to implement automation for cert management, but we never see any concrete targets or success criteria for those efforts, so they certainly seem to me to just be more “asking nicely”. (I’m not sure that all of the CAs claiming to be pushing for subscriber automation actually have robust ACME or similar support yet, in fact.) (Some of the CAs made explicit promises years ago to not delay revocation, some of them issued even though they knew that zlint showed an error—there are lots of additional twists on simply “issuing bad certs and not cleaning them up as agreed”.) Now, in the wake of these *many* delrev incidents, over years of history, the root programs have responded with pretty much no consequences whatsoever as far as I can tell. There’s one case open about Entrust’s overall behaviour, who are certainly over-achieving when it comes to ways to get location fields wrong, but they are definitely not the only ones who treat the BRs’ 1/5-day revocation instruction as instead meaning “when it’s convenient for the customer”. THE QUESTION So: what should be done to make revocations of misissued certificates—sometimes *intentionally* misissued certificates—as prompt as the BRs require? The cost equation for CAs is obviously skewed against the health the web PKI, if we are to believe that the BRs are important. Once a CA has violated the BRs and misissued, it is *in their commercial interest* to not revoke promptly: it causes embarrassment and subscriber frustration, or even disruption to subscriber services. At the limit it might even lead a subscriber to change CAs if the reissuance events are frequent and disruptive enough. On the other hand, the more bad certs there are floating around, even if it’s “only” a matter of a case mismatch, the less interoperable the web PKI is, and the harder it is for a relying party to make effective use of WebPKI’s guarantees. Let’s please not end up with a “quirks mode” for TLS certificate handling! SOME OPTIONS One option: decide that there really are some BR violations that “don’t matter”, such that revocation can happen on a more relaxed, accommodating timeline—or perhaps not at all, just letting them expire as has been seen in some delrev incidents already. This would mean that we would still see incident reports that in theory help other CAs learn to put the postal code in the right field or similar, but subscribers and CAs and root programs would have to do less work. Another option: have affected certificates added to OneCRL after 72 hours. It would benefit from some automation, but it’s probably feasible to make relatively smooth. It is sometimes the case, worryingly, that it takes CAs a fair bit of time and multiple attempts to find all the affected certificates, so this might require some linter running off CT logs or similar as a watchdog. Another another option: forbid CAs from selling WebPKI certificates into environments where a) revocation within a 5-day limit is operationally infeasible, and b) disruption of the related services would cause risk to human health and safety or similar. There are apparently many organizations out there which are critical to national economies or whatever, but need literal Earth months to replace a certificate. These are clearly cases where the requirements of WebPKI are incompatible with the operational constraints of the subscriber, so it’s not a good idea to mix them. (I’m sure some CAs could offer help with private PKI systems, probably with compelling margins.) Yet another, this time somewhat more preventative: if a CA repeatedly demonstrates that they are unable or (always the case?) unwilling to honour their commitments to the BRs, impose validity length restrictions on certs that they issue. At least in that case future misissued certificates would be in the wild for longer, and it would also show nicely that CAs’ advocacy for certificate automation was fruitful. Ignoring Entrust’s diatribe against 90-day validity periods in that weird blog post, I don’t think that any CA has made a credible case that their customers would not be able to handle rotating certificates every 90 days, even if they have to carve the new fingerprint into a mountain using a toothbrush or whatever. They’d even know it’s coming. One more: make delayed revocation incidents, specifically, more visible to subscribers and potential subscribers, and see if business pressure does what merely “agreeing legally to follow the BRs” (and optionally making empty “it’ll never happen again” promises) has been unable to do in too many cases. THANKS FOR READING I think the WebPKI is being poorly served by the *realities* of certificate integrity and misissuance responses. If nothing else, it’s causing a ton of delrev incidents for Ben to have to shepherd, without even module peers to assist him. Something needs to change. -- You received this message because you are subscribed to the Google Groups "dev-security-policy@mozilla.org" group. To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-policy+unsubscr...@mozilla.org. To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CADQzZqtRffg96jW1vdiPkVq96vdS1S1L7V9MaD-8ns9xm%3DZ1ew%40mail.gmail.com.