Based on what we observe in recent and current delrev incidents, it defies 
belief that any strategy involving categorizing certificates into those 
that warrant immunity from the revocation rules will be accurate, fair, or 
in the best interest of the WebPKI.  Over the past four months we have 
watched more than a dozen public CAs choose to delay revocation of 
unambiguously misissued certificates for vast periods of time ranging into 
the span of months.  Many of these incidents involved delayed revocation 
for the majority of the affected certificates.  The CAs offer the flimsiest 
of excuses, or make no attempt at excuses at all.  They drag out the same, 
tired comments about lengthy approval processes and prohibitive 
regulations.  They use holidays, weekends, vacations, and end of quarter as 
excuses.  They say these systems are critical until it turns out that would 
be against the CPS in effect, at which point conveniently these systems are 
NOT critical anymore.

Any parent quickly learns to detect when you’re being handed a line, and 
Bugzilla is being handed lines left and right.  Most of these CAs don’t 
even display the creativity to make up their own bad fabrications and 
instead simply crib bad fabrications from those who have come before.  The 
poster child here is the obligatory misrepresentation of Mozilla’s delayed 
revocation policy.  This policy emphasizes that CAs are expected to follow 
the BR revocation deadlines every time, but CA after CA conveniently omits 
that part of the policy as they wave their hands around why this is not 
blatant disregard for the rules, as if the rest of us somehow lack the 
ability to look up the original policy to read and understand what it says.

It’s a sad state for the set of companies that supposedly are the guardians 
of public identity and the security of the WebPKI.

Regretfully, I for one have come to the conclusion that we cannot rely on 
Subscribers and their CAs to fairly categorize certificates into those 
qualifying for some kind of extended revocation timeline and those that do 
not.  If we are to take reporting CAs at their word, then we know 
Subscribers have a propensity to actively lie to CAs or omit the facts if 
they see it being to their advantage in gaining a revocation delay.  Or the 
more disturbing thought is that the CAs themselves are omitting, or lying, 
or coaching their Subscribers on how to omit and lie so that they can 
reliably delay revocation of certificates.

The fact is that Subscribers actually are able to replace these 
certificates on time.  They simply don’t want to.  It’s a hassle.  It takes 
time away from other projects.  It messes with their evenings and 
weekends.  Sometimes it costs extra budget.  Hell, if they can delay it 
long enough, a nontrivial number of certs will expire on their own and 
won’t require revocation at all. So, when given the opportunity to 
represent their processes and systems as incapable of agile certificate 
replacement, Subscribers sing that tune.  When given the opportunity to 
explain the results of forced on-time revocation as disastrous, they sing 
that tune also.

CAs, likewise, are motivated the wrong way. They can be sticklers and make 
their paying customers angry at them, or they can be lenient and become 
heroes in the customers’ eyes.  This can be a powerful temptation, and we 
have seen CAs succumb over and over again.

Perhaps if we could create some kind of objective, perfect, fair, 
consistent, and externally measurable criteria for certificate use cases 
and circumstances warranting a revocation delay, then those rules could be 
enacted for all CAs to follow equally.  But I don’t see any credible 
candidates for these criteria, have never heard a legitimate proposal for 
such a thing, and do not believe it is possible.

Making CA opinion the basis for judging which certificates deserve a 
deadline extension is also unworkable.  We have the abovementioned problems 
with Subscriber and CA credibility. Additionally, there is simply no way a 
CA has the visibility and detailed operational knowledge needed to 
genuinely evaluate a Subscriber’s ability to swap out certificates in a 
given timeframe and the consequences of failure to do so.

There is, however, an organization that is intimately familiar with the 
Subscriber’s processes and abilities and the consequences of missing 
certificate replacement on time.  This organization is capable of making 
risk/reward tradeoffs for certificate agility versus other initiatives and 
can enact real-time resource and process adjustments to deal with 
unforeseen revocations.  That organization is, of course, the Subscriber 
whose certs are up for revocation.

Subscribers who learn their certificates will be disappearing at a specific 
time roughly 100 hours from now always seem to have the new certificates 
installed before the old ones die.  Always.  We know this at Sectigo 
because for the past few years we have not entertained delayed revocation 
requests by any Subscriber in any environment for any reason.  We simply 
let them know when their certificates will stop working and focus on 
helping them obtain and install replacements.  And I personally believe, 
unless it becomes codified as an exception policy in the relevant 
regulations, that Sectigo will never entertain the idea of purposefully 
delaying revocation again.

I firmly believe this to be the only viable path forward.  We need to 
abolish the deliberate delay of mandated revocations. 

Removal of deliberate delrevs serves the WebPKI in many ways.

- It maintains a clean and compliant certificate base for Relying Parties 
to trust.

- It increases motivation for CAs to strive for error-free operations.

- It encourages automation and certificate agility among Subscribers, who 
know they won’t be able to talk their way out of a revocation event.

- It is consistent, fair, simple to understand, and easily measured.

- It teaches CAs and anyone else watching the WebPKI that the rules matter 
and must be followed.

- It eliminates counterproductive motivators influencing CAs today.

- There is a clear path to success that every CA has the technical and 
procedural ability to execute.

Of course, a reading of the Baseline Requirements and the major root store 
program guidelines will reveal this requirement today.  However, we are 
missing meaningful, reliable consequences for failure to comply.  In each 
of these cases the CA believed the negatives of transparent disobedience to 
the BRs to be less than the negatives of completing the 
revocation.  Otherwise they would not have delayed. 

And with few exceptions they were probably right.  Most of the CAs with 
willful delrev incidents from the past four months, or the past four years, 
will not face distrust as a direct result.  And in an ecosystem with no 
other penalty for noncompliance, this means most CAs are pragmatically 
motivated to appease their paying customers – or their bosses in the larger 
organizations that own them – even at the expense of the WebPKI.  Right 
now, the penalty is that you have to write up a Bugzilla incident and 
answer uncomfortable questions from a few nosy jerks for a couple of months 
until everyone gives up in frustration and lets you close the bug.  In many 
cases, the CA judges this to be considerably less painful than angering one 
or more of the Subscribers that keep the CA operational.

We need root programs to enforce these rules with enough power to tip the 
decision-making scales.  We need CAs to dread the consequences of delayed 
revocation more than they dread Subscriber displeasure.  That has to mean 
either 1) that the likelihood of root distrust goes up dramatically among 
one or more major root programs or 2) that the WebPKI comes up with some 
kind of alternative consequence.  This consequence would have to be painful 
enough to seriously demotivate intentional delay but not so severe that 
browsers are unwilling to use it.

On Thursday, July 25, 2024 at 11:29:33 PM UTC-4 Suchan Seo wrote:

> Would we want something between full revoke and left it as is? like strip 
> EV/OV data in malformed certificate make sense? not fully revoke but treat 
> it as DV (not showing other data in certificate, warn user when client 
> download/view parsed certificate)
> is there things that actually reads from OV/EV data? or OCSP infomation 
> that amends certificate (changed fields and new sign with modified fields)
>
> 2024년 7월 26일 금요일 오전 5시 53분 18초 UTC+9에 Ben Wilson님이 작성:
>
>> Thanks, Amir,
>>
>> I appreciate your challenge to the assumption that the 5-day rule is the 
>> primary reason why we are where we are. We recognize that this is a 
>> difficult area, and many CAs have fallen short. Let's explore whether to 
>> create a meta-incident, but the more important part is that we come up with 
>> a meta-solution. As part of the effort, we should continue looking at the 
>> root causes to gain a more comprehensive understanding of the issues at 
>> play. 
>>
>> Here are my thoughts on the observations you’ve provided:
>>
>> That this problem is not affecting every CA equally suggests that there 
>> are other variables at play. We ought to look at the culture of compliance, 
>> the nature of their customer bases, and the CAs' relationships with their 
>> customers, among other variables, to understand these differences, which 
>> will then help us to identify better solutions.
>>
>> We agree that OV and EV certificates are more affected, in part, because 
>> the primary causes of misissuance are mistakes in the additional fields 
>> that such types of certificates they contain, and perhaps the customers who 
>> use them. 
>>
>> We agree that CAs should not perceive that they can bypass the rules 
>> without consequence. We need to ensure that there are clear and consistent 
>> consequences for non-compliance, and the rules to achieve and maintain 
>> compliance need to be more clear.
>>
>> I agree that the vast majority of the recent delayed revocation incidents 
>> would have still been delayed revocation incidents even if the period was 
>> extended to 20 days. However, I am hoping that a 20-day timeframe, along 
>> with an effort to phase out most of the excuses (by requiring quicker and 
>> more specific disclosures from CAs and their subscribers about reasons for 
>> delay), will reduce the scale of this issue. This effort will need 
>> collaboration by CAs and root stores. We should explore treating a failure 
>> to pre-disclose the required information, publicly, as a key focus of 
>> delayed revocation Bugzilla filings.
>>
>> Finally, Mozilla also believes that automation, both in issuance and in 
>> replacement and revocation, is a path forward, but we need to move more in 
>> that direction first in the short term before it can become a long-term 
>> solution.
>>
>> Also, regarding your final question, this discussion does not pause the 
>> BR requirement, but in dealing with CAs we shouldn't disregard the 
>> complexities of the issues presented.
>>
>> Thanks again,
>>
>> Ben
>>
>>
>>
>> On Thu, Jul 25, 2024 at 12:49 PM Amir Omidi <[email protected]> wrote:
>>
>>> Hi Ben,
>>>
>>> Thank you for outlining your view on the current problems. I think we're 
>>> probably all in agreement that the current 5-day revocation period is not 
>>> working effectively. To understand what's going on, we may need to treat 
>>> this as a meta-incident. For example, what are the timelines involved here, 
>>> what was the situation like in the past, and what is the root cause of 
>>> these problems?
>>>
>>> I fear that we're proposing action items without really understanding 
>>> what has gone wrong. I do want to challenge the conclusion that the reason 
>>> this is not working is definitely because of the 5-day revocation rule. The 
>>> vast majority of the recent delayed revocation incidents would have still 
>>> been delayed revocation incidents even if the period was extended to 20 
>>> days.
>>>
>>> Here are a couple of observations I've had that may help with the 
>>> analysis here:
>>>
>>>    - This problem is not affecting every CA equally, and there does not 
>>>    seem to be a correlation between percentage of total issuance and 
>>> delayed 
>>>    revocation incidents.
>>>    - This problem seems to mainly be impacting OV and EV certificates. 
>>>    CAs that primarily issue DV certs have a much easier time getting 
>>>    revocations done on time.
>>>    - Up until the recent distrust of Entrust by the Chrome Root 
>>>    Program, there has been no incentive for CAs to actually follow the 
>>> rules. 
>>>    If I were a CA, I'd personally have a hard time justifying following the 
>>>    BRs when I could just tell root programs my customers are special.
>>>
>>>
>>> Am I also to understand that, as we're in the process of figuring out 
>>> what to do here, the 5-day revocation rule is effectively on pause from the 
>>> Mozilla Root Program perspective?
>>>
>>>
>>> On Wed, Jul 24, 2024 at 6:11 PM Ben Wilson <[email protected]> wrote:
>>>
>>>> Mike and Amir,
>>>>
>>>> Here are some of the goals that come to my mind from the perspective of 
>>>> the Mozilla Root Program, followed by my short response concerning what to 
>>>> do with the current framework.
>>>>
>>>>    1. Security and Privacy of Users: Our foremost goal, from Principle 
>>>>    #4 of the Mozilla Manifesto 
>>>>    <https://www.mozilla.org/en-US/about/manifesto>, is to ensure the 
>>>>    security and privacy of our users. This includes promoting the 
>>>> advancement 
>>>>    and proper use of TLS technology to provide privacy and security.
>>>>    2. Operational Stability: Another critical goal is to maintain the 
>>>>    stability of the internet, ensuring that our actions do not 
>>>> inadvertently 
>>>>    cause widespread disruptions.
>>>>    3. Secure CA Operations: Ensuring that Certification Authorities 
>>>>    (CAs) operate securely is paramount. Our goal is to work 
>>>> collaboratively 
>>>>    with them as partners in securing the internet.
>>>>    4. CA Compliance with Continuous Improvement: We strive for a 
>>>>    smooth-running CA program, focusing on proper remediation of CA 
>>>> compliance 
>>>>    issues, so it’s not just about closing compliance bugs in Bugzilla. 
>>>>    Improving CA transparency through better incident reporting processes 
>>>> is 
>>>>    key to this goal. We also aim to improve the incident reporting process 
>>>>    continually, encouraging disclosure and remediation in a way that 
>>>> benefits 
>>>>    the entire community.
>>>>
>>>> Currently, the 5-day revocation period is not working effectively, as 
>>>> evidenced by ongoing issues documented in Bugzilla. As I said before, I’d 
>>>> like to reach a consensus determination on what is best for the ecosystem. 
>>>> While I understand the argument for stricter revocation timelines, I 
>>>> believe there are broader considerations based on how this valuable TLS 
>>>> technology is currently being used to support healthcare, airlines, 
>>>> banking, etc. 
>>>>
>>>> Contemporaneously with this discussion here, I plan to turn my 
>>>> attention to GitHub Issue #276 
>>>> <https://github.com/mozilla/pkipolicy/issues/276> and start addressing 
>>>> the issue with better guidance in the wiki 
>>>> <https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation> 
>>>> about reporting expectations and with new language (TBD) to be added to 
>>>> the 
>>>> Mozilla Root Store Policy. I also plan to be more proactive in commenting 
>>>> on CA compliance reports.
>>>>
>>>> In summary, Mozilla's goals align closely with those of other root 
>>>> programs--maintaining control over CAs and minimizing their non-compliance 
>>>> while ensuring secure and effective CA operations.
>>>>
>>>> Thanks, and keep the conversation going so that we can come to some 
>>>> consensus.
>>>>
>>>> Ben
>>>>
>>>>
>>>> On Wed, Jul 24, 2024 at 3:10 PM Mike Shaver <[email protected]> wrote:
>>>>
>>>>> On Wed, Jul 24, 2024 at 5:06 PM Amir Omidi <[email protected]> wrote:
>>>>>
>>>>>> What are the issues you see from the perspective of a root program 
>>>>>> with the current framework?
>>>>>>
>>>>>
>>>>> Yes, it would be good to understand what the goals of the framework 
>>>>> are, how the current rules work against those goals, and how different 
>>>>> approaches (another deadline extension, a “bad cert, pls ignore” 
>>>>> attribute, 
>>>>> random audit through revocation, etc.) would better reach them.
>>>>>
>>>>> Without that it is hard to really figure out what might be helpful, 
>>>>> since we may well have different goals in mind!
>>>>>
>>>>> Mike
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/21944779-f122-4607-a07c-24a701490481n%40mozilla.org.

Reply via email to