Apologies, Ryan, for the delayed reply.

It’s valuable to note first that a CPS frequently states what the 
Certificate Authority will do under appropriate circumstances, rather than 
exclusively stating what it must do under all circumstances.  This is 
easily demonstrable by the fact that multiple acceptable methods exist for 
certain required tasks, such as DCV.  A CA may list multiple possible DCV 
methods in its CPS and make them all available for domain validation 
without taking on the obligation that all listed methods are employed in 
the case of every single domain.

This distinction is important.  Several of the examples you give as reasons 
for drift do not, we believe, present any kind of CPS alignment problem.  
The CA should make updates to its CPS in advance of rolling out a new 
practice, for instance, so that when the practice is rolled out it will be 
in alignment with the published CPS.  So long as the CA is stating that it 
MAY, rather than WILL ALWAYS, follow the new practice, this is perfectly 
fine.  (And so long, naturally, as the practice is otherwise compliant with 
current requirements.)

Likewise, if the CA determines to add incremental controls beyond what the 
CPS states (and presumably beyond the minima necessary for root program and 
BR compliance), that will not represent a problem.  We at Sectigo, for 
example, have a policy whereby all new OV/EV SSL certificates contain one 
discrete country name/state name combination that comes from our list of 
6000+ possible combinations.  Issuance of a certificate that contained 
something other than one of those combinations would be out of alignment 
with our policy, and we would deem such an occurrence to be a technical 
failure requiring investigation and resolution, as it should not have 
happened.  It would not necessarily, however, represent CPS misalignment, 
as our CPS does not list those possible combinations.

The one example in the list that jumps out as important and in need of 
addressing is this idea of rapid responses to previously unknown 
circumstances.  Walk with us through a brief thought experiment:

Let’s say that a CA comes to the realization that it failed to make a 
mandated update to a specific issuance practice affecting a subset of its 
certificates, call them Group A Certificates.  The CA might make an 
all-hands-on-deck effort to write, test, and deploy a code fix to update 
its practice for Group A.  However, if the updated (and otherwise 
compliant) procedural details are not in alignment with the previously 
published CPS, then Group A certificates will continue to count as 
misissued certs, even though all procedures follow specified in CABF 
rules.  The problem will persist until the new CPS is published.

This is despite the fact that the severity of the errors is not the same.  
Many CPS errors are essentially clerical errors.  We have seen this 
recently with various off-by-one-second errors CAs are experiencing as 
compared to their practice statements.  We appreciate the value of a CPS as 
a source of truth for details of how the CA operates.  We appreciate the 
importance that such details be accurate.  We also contend that there is a 
meaningful difference between issuing a 90-day leaf certificate that is 
over the limit stated in the CPS by one second and issuing an SSL 
certificate that exceeds 398 days by one second.  While 90 days plus one 
second can create CPS misalignment, there is no sensible argument that it 
actually brings security or trust risk beyond the general expectations for 
public SSL certificates.

One possible response is simply to say that a CA must be able to update its 
CPS in an agile fashion and that surely it is quicker work to change a 
document and publish it than it is to make the other changes necessary to 
account for a compliance issue.  As a policy that has the advantage of 
being clean and unambiguous.  Its disadvantage is subtler but worth 
uncovering.

Every requirement or regulation becomes another thing that a CA must 
implement and monitor.  Each is like a tax that requires just a little bit 
of the CA’s work and attention.  No requirement on its own seems like that 
big of a deal, but the more requirements we place, the more risk there is 
of performance failure.  Death by a thousand cuts.  The straw that broke 
the camel’s back.

Indulge us while we use an analogy from popular culture.  At one point in 
the movie Pulp Fiction boxer Butch Coolidge is hiding out from LA mobsters 
in a motel room. He goes to get his heirloom gold watch from the suitcase 
full of possessions he has asked SO Fabienne to pack and bring. It is not 
there because Fabienne missed it among the long list of items she was to 
get.  Before going back to recover the watch, Butch expresses that it was a 
mistake to make a long list when doing so put the one possession he really 
cared about, the watch, in jeopardy.

The same possibility exists with CA rules.  It’s easy to add another rule 
and simply say, “Well, if you don’t have the chops to get this done, maybe 
you’re not cut out for the life of a public CA.”  And once again, that is 
one possible position.  The weakness of that position is it ignores the 
relative importance of our various rules.  If CAs become swamped with rules 
that do not meaningfully improve security or trustworthiness of 
certificates, does that task list increase the risk of failure among the 
set of rules we truly care about?  Are we being the best possible stewards 
of public online trust if we choose matters of trivial importance over the 
vital ones?

At a macro level we should we aware of this tradeoff whenever considering 
the rules we give ourselves as a community. It is the need for 
prioritization.

So bringing these deep thoughts back to the specifics of this thread, the 
question to ask is, what is the value of demanding full CPS/practices 
synchronization as opposed to allowing reasonable, short gaps to facilitate 
rapid response?  It may be that allowing such gaps leads to more effective 
adjustments as the CA can remove the distraction of CPS review and 
publication until the immediate need is handled.  In principle that could 
be worth it.

Of course, any ballot to make this change would have to codify exactly what 
time gaps were allowable under what circumstances.   It’s easy to imagine 
that any permitted gaps would be short, a matter of a few days perhaps, and 
for good reason.  What those reasons are and for how long we tolerate 
CPS/practice disconnect would require some work to figure out.


Tim Callan

Sectigo

On Friday, September 17, 2021 at 11:12:06 AM UTC-4 Ryan Hurst wrote:

> Hi MDSP community,
>
> There have been a number of past issues where actual practices followed by 
> CAs deviated from published practices. We recently had a delay in 
> publishing our CPS which resulted in our CPS not removing language that 
> accommodated a practice we were no longer using as a result of code changes 
> we had deployed to prevent the practice. 
>
> After reviewing past incidents, current requirements, and considering 
> improvements we could make, it led us to question whether there is an 
> opportunity to improve the timeliness and accountability for CPS 
> publications across the ecosystem to accurately reflect actual practices.
>
> There are a number of common reasons that a CPS might drift, for example:
>
>    - 
>    
>    Canary deployments (https://martinfowler.com/bliki/CanaryRelease.html), 
>    where a change is rolled out to a small subset of users as an initial test 
>    before making it available to everybody.
>    - 
>    
>    Rapid deployment of a change containing a needed security fix or 
>    security enhancement.
>    - 
>    
>    Enhancing a process to be more restrictive than the CPS.
>    - 
>    
>    CPS updates may be necessary before a new type of certificate is 
>    issued to get the trust store approval, for example to get permission to 
>    issue code signing certificates.
>    - 
>    
>    Delays in obtaining reviews from external stakeholders such as the 
>    policy authority, executives, or legal team.
>    
> There are also a number of anti-patterns that might contribute to 
> unnecessary delays in updates, for example:
>
>    - 
>    
>    Update processes where changes to the CPS are only made following a 
>    fixed schedule.
>    - 
>    
>    Bundling of many updates together into a large update, slowing review 
>    and publication.
>    - 
>    
>    Changes in practices not being reflected in the CPS due to process 
>    gaps.
>    
>
>    - 
>    
>    Challenges prioritizing CPS reviews and edits against other 
>    high-priority items such as those in an action plan relating to an 
> incident.
>    
> As we look at these cases it seems there may be cases where drifts in code 
> behavior and the CPS, while not ideal, may be unavoidable, which is made 
> more complicated for the community since such drifts are difficult for the 
> community to detect. 
>
> This leads us to think that there may be value in having a clear lower 
> bound in which drift is acceptable added to the BRs where auditors would be 
> expected to assess if that requirement was being met as part of the audits.
>
> Additionally we thought it would be good to have a conversation to gather 
> the community’s thoughts around under what circumstances this does happen 
> in your environments and what you do to manage that drift. Any thoughts 
> would be greatly appreciated.
> Ryan Hurst
> Google Trust Services
>

-- 
You received this message because you are subscribed to the Google Groups 
"dev-security-policy@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dev-security-policy+unsubscr...@mozilla.org.
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/d110e6db-47ba-49c7-a693-43ec9cb1d3b5n%40mozilla.org.

Reply via email to