Re: when do things really need to be revoked? who decides?

'Amir Omidi (aaomidi)' via dev-security-policy@mozilla.org Thu, 30 May 2024 09:14:58 -0700

In my experience (and through what I've heard from others), at least in 
large enterprises, the work for automating cert issuance and replacement is 
simply *not important*.


I've asked a few folks who would be in the place to do that automation work 
and in nearly all cases they tell me they know what they need to do, they 
know the task is not necessarily a fast one to complete, but it will 
forever be de-prioritized because it *just doesn't matter*. If something 
happens, they can ask their CA of choice to delay revocation - they seem 
to  believe that certain CAs would be fine delaying revocation even in the 
case of key compromise.

This really brings it back down to incentives and priorities. Changing how 
you do certificates in an enterprise is definitely not easy. This is 
exacerbated by the fact that DNS-01 does not allow DNS validation 
delegation to more than one entity (CNAMEs are unique records, so you can't 
have _acme-challenge.example.com pointing at two different DNS names, 
something I'm trying to solve through 
https://datatracker.ietf.org/doc/draft-ietf-acme-scoped-dns-challenges/). 
To be clear, I don't think that the barrier for adopting automation is 
genuinely on the technology level, but rather the lack of enforcement by 
the RPs and the lack of "care" for the rules by the CAs.

Ryan Sleevi also pointed this out a while back, that if one CA is allowed 
to delay revocations, then that will effectively cause a regression to the 
mean: 
https://lists.cabforum.org/pipermail/servercert-wg/2018-September/000170.html

Quote:
*However, let's further expand this thought experiment, and think about 
what the consequences are of having the CA make such a determination 
(Availability > Competent and Correct operation). A CA is naturally 
incentivized to optimize for Availability - the CA who never revokes their 
Subscriber certs is the CA to pick, the same as the CA that always picks 
the maximum validity period rather than the minimum. This is something 
we've seen time and time again - the worse a CA is, for the ecosystem, the 
more popular it becomes relative to its competitors. But equally, a CA that 
issues such invalid certificates, but encourages non-revocation, equally 
encourages the rest of the ecosystem to do so, such that it becomes the 
norm - a de facto standard of incompatibility.*

*If the client does not aggressively police this - at verification time - 
then the ecosystem falters, because the existing players accept such certs, 
and thus any new Relying Parties must also accept such certs.*

I really do think that a lot of these "difficulties" are baseless excuses. 
They are stemmed from avoidance of what seems like unnecessary work (we can 
just ask the CA to delay revocation). 

The state that WebPKI is in right now is not a healthy one. We have a list 
of rules, that we're treating mostly as "guidelines". This means that when 
a CA feels like it can ignore one part of the BRs, and the RPs simply don't 
care when a CA does that, then CAs are going to feel a lot more adventurous 
to ignore other parts of the BRs. If left alone, we're just going to see 
more and more CAs that don't even know what the BRs are.

So if we're talking about the options laid out in this post, I'd like to 
add one more option: If the RPs are finding the existing rules impossible 
to enforce then trash the existing rules and write a new ones that are  
watered down enough to the point where the RPs can actually enforce them. 
Don't treat these rules as guidelines.
On Thursday, May 30, 2024 at 12:07:57 PM UTC-4 Mike Shaver wrote:

> OK, then I guess that’s how that industry and subscriber want things to 
> go, given that they prefer this to the alternatives of hot-spare 
> certificates or non-public PKI or such.
>
> It may be that the subscribers in these industries need to lobby for a 
> better understanding of WebPKI operations and principles on the part of 
> their regulatory body, but that’s outside the scope and concerns of the BRs 
> and root programs.
>
> Thank you for your help understanding these issues!
>
> Mike
>
> On Thu, May 30, 2024 at 11:40 AM Jeremy Rowley <jeremy...@digicert.com> 
> wrote:
>
>> Yes – sites have gone down while waiting for approval. We have caused 
>> outages by revoking if we didn’t think the revocation would meet the bar 
>> required for a delayed revocation report or if there is a key compromise 
>> (requiring 24 hour revocation).
>>
>>  
>>
>> *From:* dev-secur...@mozilla.org <dev-secur...@mozilla.org> *On Behalf 
>> Of *Mike Shaver
>> *Sent:* Thursday, May 30, 2024 9:36 AM
>> *To:* Jeremy Rowley <jeremy...@digicert.com>
>> *Cc:* Wayne <rdau...@gmail.com>; dev-secur...@mozilla.org
>> *Subject:* Re: when do things really need to be revoked? who decides?
>>
>>  
>>
>> “The certificate is being replaced because it will be revoked by the CA, 
>> as we agreed to legally when we were issued the certificate” seems like it 
>> would need to be sufficient, unless the regulatory body also has some way 
>> to compel the CA to forestall revocation itself, in violation of its root 
>> program agreements. Historically, have the regulatory bodies preferred that 
>> these services go offline while waiting for approval? (Or do they ask “why 
>> don’t you have a fallback option here?”, perhaps?)
>>
>>  
>>
>> But this would require the CA to actually set and credibly enforce a 
>> revocation timeline within the parameters of the BRs, and it seems that 
>> some CAs are, worryingly, unwilling to do so.
>>
>>  
>>
>> Mike
>>
>>  
>>
>> On Thu, May 30, 2024 at 11:28 AM Jeremy Rowley <jeremy...@digicert.com> 
>> wrote:
>>
>> Here’s one example provided during our revocation: Hong Kong Monetary 
>> Authority - Technology Risk Management (hkma.gov.hk) 
>> <https://url.avanan.click/v2/___https:/www.hkma.gov.hk/eng/regulatory-resources/regulatory-guides/by-subject-current/technology-risk-management/___.YXAzOmRpZ2ljZXJ0OmE6bzo2MjY4NTMxODhhMTZlYmRlYjA2OGRlZmIzYTliYzRlYTo2Ojk2YzI6NWJlY2JmYTc2NmU0MWJiNzIxYjkyZGMyMTk2ZmRlODcyMThjZjg3Nzc1MWM4Y2ExMWYzNmQyZGYyNmViNmUwMTpoOlQ>
>>
>>  
>>
>> I haven’t read it all, but according to several of the Chinese banks, 
>> there’s monetary damages if they replace a certificate without government 
>> permission. We had the same issue in South America, but those subscribers 
>> haven’t sent over the corresponding regulations yet. 
>>
>>  
>>
>> I don’t think it’s a bright line rule. What I’ve consistently heard is 
>> they need government approval, which is easily obtained in security 
>> incidents but hard to obtain when the regulator does not understand why the 
>> certificate is being replaced. One question I always ask is “What would you 
>> do in key compromise?”. The answer I get back is that key compromised would 
>> be better because the regulator’s understand that. They don’t understand 
>> why a capitalization issue is requiring a cert rotation. The worse the 
>> issue, the faster and easier to get permission. 
>>
>>  
>>
>> *From:* Mike Shaver <mike....@gmail.com> 
>> *Sent:* Thursday, May 30, 2024 9:23 AM
>> *To:* Jeremy Rowley <jeremy...@digicert.com>
>> *Cc:* Wayne <rdau...@gmail.com>; dev-secur...@mozilla.org
>> *Subject:* Re: when do things really need to be revoked? who decides?
>>
>>  
>>
>> Have we actually seen any evidence that this is the barrier, and not just 
>> the Subscribers’ historic processes? It seems like the sort of thing that 
>> one would expect to be outlined in the per-subscriber detail expected by 
>> the BRs, but those tend to be extremely sparse on actually *why* the 
>> Subscriber cannot rotate more promptly. In my casual survey of delayed 
>> revocation incidents, virtually every case is “the Subscriber’s internal 
>> processes and tooling doesn’t permit it”, which the CA assures us they will 
>> advocate to remedy, and not “there is a regulatory barrier for this 
>> Subscriber which will always exist”. If there were a piece of 
>> bright-line regulation that prevented rotation within 120 hours, I would 
>> have expected it to have been brought up by now in one of the many delayed 
>> revocation incidents.
>>
>>  
>>
>> Could you point the community to an example of such a regulation that 
>> would prevent a certificate from being validated and deployed within the 
>> timelines, given sufficiently motivated Subscribers? They might have to pay 
>> overtime or rush fees, of course, but choosing not to do so is not the same 
>> as being unable.
>>
>>  
>>
>> But then, I wonder, what are these companies expected to do if there is a 
>> key compromise? (If they had a validated backup certificate from a 
>> different vendor—like they probably do for all their other essential 
>> operating elements—then they could just drop it in without having to run a 
>> validation process in the moment. This seems like such an obviously 
>> necessary SPOF to address that I’m amazed that it’s not the norm in 
>> regulated industries like the ones you describe.)
>>
>>  
>>
>> It also seems unusual to me that so many of these regulated-industry 
>> Subscribers would have made a legally binding commitment to 9.6.3(8) of the 
>> BRs, which requires them to “accept” immediate revocation, if they had 
>> these opposing regulatory constraints in place.
>>
>>  
>>
>> Mike
>>
>>  
>>
>> On Thu, May 30, 2024 at 10:52 AM 'Jeremy Rowley' via 
>> dev-secur...@mozilla.org <dev-secur...@mozilla.org> wrote:
>>
>> From my perspective, it’s the third-party approval process some of these 
>> companies are required  to go through to replace certs. Failure to go 
>> through that process can result in government fines. Financial and medical 
>> companies operating outside of the US seem especially handicapped by policy 
>> restrictions when replacing certificates.
>> ------------------------------
>>
>> *From:* dev-secur...@mozilla.org <dev-secur...@mozilla.org> on behalf of 
>> Wayne <rdau...@gmail.com>
>> *Sent:* Thursday, May 30, 2024 8:34:16 AM
>> *To:* dev-secur...@mozilla.org <dev-secur...@mozilla.org>
>> *Subject:* Re: when do things really need to be revoked? who decides? 
>>
>>  
>>
>> In the delayed revocation incidents recently, the main barrier for 
>> replacing a certificate has been deployment. I've not heard of validation 
>> being an issue as-of-yet, but it may just not have been mentioned.
>>
>> On Thursday, May 30, 2024 at 6:49:04 AM UTC+1 Suchan Seo wrote:
>>
>> I wonder what makes certficiate replacement slow and not wanted to do so 
>> - is it validation step or deploy new certficiate everywhere old 
>> certificate was?
>>
>> OV/EV related valiations are valid for 398 days as 3.2.2.14.3 so most of 
>> revalidation should be about validating domains: 
>>
>>  
>>
>> for simplyfying later part there could be an ocsp extension that points 
>> to another certificate (that signs same skid/publikey) that tell while this 
>> certificate itself is revoked, but this is replacement that likely to be 
>> valid: this makes in effect skips certificate deployment process, make 
>> replacement single email to webmaster to authroize replacement certificate.
>>
>> 2024년 5월 21일 화요일 오전 9시 46분 0초 UTC+9에 Mike Shaver님이 작성:
>>
>> DELAYED REVOCATION IS TOO COMMON
>>
>>  
>>
>> This is long enough, so I’ll spare readers dozens of links to 
>> delayed-revocation incidents collected in Bugzilla; we all know that pretty 
>> much any other incident that involves misissuance will come with a 
>> delayed-revocation chaser these days. 
>>
>>  
>>
>> In *many* of those delrev (?)incidents, we see a phrase like “we 
>> requested that our subscribers revoke and reissue”. They are not informing 
>> their subscribers as to a fixed revocation timeline, but rather simply 
>> asking if those subscribers if they would please do the revocation process 
>> when they’re able. In one case, I heard of a revocation request from a 
>> major CA that didn’t even have a timeline *suggested*. Of course, the 
>> subscriber gets no value out of replacing their certs: it’s pure overhead, 
>> and if WebPKI were operated perfectly, it would never be necessary. This is 
>> an externality of, most often, a CA’s failure to sufficiently invest in 
>> understanding, implementing, and verifying the processes that they use to 
>> twirl the keys to the entire web’s security.
>>
>>  
>>
>> Indeed in a number of cases the CAs didn’t even stop issuing once they 
>> realized that they were misissuing certs! Intentionally issuing certs that 
>> are known to be bad, what a world.
>>
>>  
>>
>> While CAs generally claim that they would be able to handle a mass 
>> revocation incident (such as due to leaked key material), the evidence we 
>> have for CAs aggressively revoking as called for by the BRs and the root 
>> programs is…scant. We’ve seen “it was a long weekend” as a reason for 
>> delaying revocation for certs—including some used by a different part of 
>> the CA’s company! One CA has proposed a “global fire drill” to stress-test 
>> revocation procedures, but we’re seeing revocation timelines reaching 
>> multiple months right now, so…a lot of stuff would end up burning in that 
>> fire.
>>
>>  
>>
>> CAs also tell us that they advocate and recommend for their subscribers 
>> to implement automation for cert management, but we never see any concrete 
>> targets or success criteria for those efforts, so they certainly seem to me 
>> to just be more “asking nicely”. (I’m not sure that all of the CAs claiming 
>> to be pushing for subscriber automation actually have robust ACME or 
>> similar support yet, in fact.)
>>
>>  
>>
>> (Some of the CAs made explicit promises years ago to not delay 
>> revocation, some of them issued even though they knew that zlint showed an 
>> error—there are lots of additional twists on simply “issuing bad certs and 
>> not cleaning them up as agreed”.)
>>
>>  
>>
>> Now, in the wake of these *many* delrev incidents, over years of history, 
>> the root programs have responded with pretty much no consequences 
>> whatsoever as far as I can tell. There’s one case open about Entrust’s 
>> overall behaviour, who are certainly over-achieving when it comes to ways 
>> to get location fields wrong, but they are definitely not the only ones who 
>> treat the BRs’ 1/5-day revocation instruction as instead meaning “when it’s 
>> convenient for the customer”.
>>
>>  
>>
>> THE QUESTION
>>
>>  
>>
>> So: what should be done to make revocations of misissued 
>> certificates—sometimes *intentionally* misissued certificates—as prompt as 
>> the BRs require?
>>
>>  
>>
>> The cost equation for CAs is obviously skewed against the health the web 
>> PKI, if we are to believe that the BRs are important. Once a CA has 
>> violated the BRs and misissued, it is *in their commercial interest* to not 
>> revoke promptly: it causes embarrassment and subscriber frustration, or 
>> even disruption to subscriber services. At the limit it might even lead a 
>> subscriber to change CAs if the reissuance events are frequent and 
>> disruptive enough.
>>
>>  
>>
>> On the other hand, the more bad certs there are floating around, even if 
>> it’s “only” a matter of a case mismatch, the less interoperable the web PKI 
>> is, and the harder it is for a relying party to make effective use of 
>> WebPKI’s guarantees. Let’s please not end up with a “quirks mode” for TLS 
>> certificate handling!
>>
>>  
>>
>> SOME OPTIONS
>>
>>  
>>
>> One option: decide that there really are some BR violations that “don’t 
>> matter”, such that revocation can happen on a more relaxed, accommodating 
>> timeline—or perhaps not at all, just letting them expire as has been seen 
>> in some delrev incidents already. This would mean that we would still see 
>> incident reports that in theory help other CAs learn to put the postal code 
>> in the right field or similar, but subscribers and CAs and root programs 
>> would have to do less work.
>>
>>  
>>
>> Another option: have affected certificates added to OneCRL after 72 
>> hours. It would benefit from some automation, but it’s probably feasible to 
>> make relatively smooth. It is sometimes the case, worryingly, that it takes 
>> CAs a fair bit of time and multiple attempts to find all the affected 
>> certificates, so this might require some linter running off CT logs or 
>> similar as a watchdog.
>>
>>  
>>
>> Another another option: forbid CAs from selling WebPKI certificates into 
>> environments where a) revocation within a 5-day limit is operationally 
>> infeasible, and b) disruption of the related services would cause risk to 
>> human health and safety or similar. There are apparently many organizations 
>> out there which are critical to national economies or whatever, but need 
>> literal Earth months to replace a certificate. These are clearly cases 
>> where the requirements of WebPKI are incompatible with the operational 
>> constraints of the subscriber, so it’s not a good idea to mix them. (I’m 
>> sure some CAs could offer help with private PKI systems, probably with 
>> compelling margins.)
>>
>>  
>>
>> Yet another, this time somewhat more preventative: if a CA repeatedly 
>> demonstrates that they are unable or (always the case?) unwilling to honour 
>> their commitments to the BRs, impose validity length restrictions on certs 
>> that they issue. At least in that case future misissued certificates would 
>> be in the wild for longer, and it would also show nicely that CAs’ advocacy 
>> for certificate automation was fruitful. Ignoring Entrust’s diatribe 
>> against 90-day validity periods in that weird blog post, I don’t think that 
>> any CA has made a credible case that their customers would not be able to 
>> handle rotating certificates every 90 days, even if they have to carve the 
>> new fingerprint into a mountain using a toothbrush or whatever. They’d even 
>> know it’s coming.
>>
>>  
>>
>> One more: make delayed revocation incidents, specifically, more visible 
>> to subscribers and potential subscribers, and see if business pressure does 
>> what merely “agreeing legally to follow the BRs” (and optionally making 
>> empty “it’ll never happen again” promises) has been unable to do in too 
>> many cases.
>>
>>  
>>
>> THANKS FOR READING
>>
>>  
>>
>> I think the WebPKI is being poorly served by the *realities* of 
>> certificate integrity and misissuance responses. If nothing else, it’s 
>> causing a ton of delrev incidents for Ben to have to shepherd, without even 
>> module peers to assist him.
>>
>>  
>>
>> Something needs to change.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "dev-secur...@mozilla.org" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to dev-security-po...@mozilla.org.
>> To view this discussion on the web visit 
>> https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/79c8a805-c043-45d4-8a06-8946425a3cb5n%40mozilla.org
>>  
>> <https://url.avanan.click/v2/___https:/groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/79c8a805-c043-45d4-8a06-8946425a3cb5n*40mozilla.org?utm_medium=email&utm_source=footer___.YXAzOmRpZ2ljZXJ0OmE6bzo0ZmQxOTMyNmMyZjE5ZjI2NDAzMDU1NDA2NmRiZTgwMjo2OjRjYmI6NDJlMWU4MWJlOWZlNjc2M2RjNGQzOGYyYjI0NTZiYjUzNTg1MTEwZWQxMjY5ZTYzMzRlNjJlN2YzMzJjMjJhMDpoOlQ>
>> .
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "dev-secur...@mozilla.org" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to dev-security-po...@mozilla.org.
>> To view this discussion on the web visit 
>> https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/BYAPR14MB2600DF510017199D80A2F5888EF32%40BYAPR14MB2600.namprd14.prod.outlook.com
>>  
>> <https://url.avanan.click/v2/___https:/groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/BYAPR14MB2600DF510017199D80A2F5888EF32*40BYAPR14MB2600.namprd14.prod.outlook.com?utm_medium=email&utm_source=footer___.YXAzOmRpZ2ljZXJ0OmE6bzo4MWUyYjI2MmQ3Zjg5YTczNWFlOWJjNzJjNWI5ZWI2Nzo2OjMyMDI6NjA4NDM0MzJhNjVlMTg5YzkzOTdkNDI5NmJkM2U3YzVlOGRhZmUyOGI5Y2E1YTQwNmExZTM2YTJjMzNiZTBmNTpoOlQ>
>> .
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "dev-secur...@mozilla.org" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to dev-security-po...@mozilla.org.
>>
>> To view this discussion on the web visit 
>> https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CADQzZqvoy0T5s-BKuQc-GBvxmb-JM%3D-FARyamXr9vcP9AQ7Aew%40mail.gmail.com
>>  
>> <https://url.avanan.click/v2/___https:/groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CADQzZqvoy0T5s-BKuQc-GBvxmb-JM%3D-FARyamXr9vcP9AQ7Aew*40mail.gmail.com?utm_medium=email&utm_source=footer___.YXAzOmRpZ2ljZXJ0OmE6bzo2MjY4NTMxODhhMTZlYmRlYjA2OGRlZmIzYTliYzRlYTo2OjI4M2E6NmY0M2I1N2VjMmMzM2YxMjA5NzY0NDRjYzE0ZWU5MmEzNGEyMjgzNjliNjE3OGJhMGM4MDQ1YjNlY2E0YTNmOTpoOlQ>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"dev-security-policy@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dev-security-policy+unsubscr...@mozilla.org.
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/33580472-76c4-4a86-be7f-0c46a4a3f23dn%40mozilla.org.

Re: when do things really need to be revoked? who decides?

Reply via email to