Ryan,

Thank you for pointing out these incidents, and for raising the meta-issue
of policy compliance. We saw similar issues with CP/CPS compliance to
changes in the 2.5 and 2.6 versions of policy, with little explanation
beyond "it's hard to update our CPS" and "oops". Historically, our approach
has been to strive to communicate policy updates to CAs with the assumption
that they will happily comply with all of the requirements they are aware
of. I don't think that's a bad thing to continue, but I agree it is is not
working.

Having said that, I do recognize that translating "Intermediates must
contain EKUs" into "don't renew this particular certificate" across an
organization isn't as easy as it sounds. I'd be really interested in
hearing how CAs are successfully managing the task of adapting to new
requirements and if there is something we can do to encourage all CAs to
adopt best practices in this regard. Our reactive options short of outright
distrust are limited- so I think it would be worthwhile to focus on new
preventive measures.

Thanks,

Wayne

On Tue, Oct 8, 2019 at 11:02 AM Ryan Sleevi via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On the topic of root causes, there's also
> https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3425554 that was
> recently published. I'm not sure if that was peer reviewed, but it does
> provide an analysis of m.d.s.p and Bugzilla. I have some concerns about the
> study methodology (for example, when incident reports became normalized is
> relevant, as well as incident reporting where security researchers first
> went to the CA), but I think it looks at root causes a bit holistically.
>
> I recently shared on the CA/B Forum's mailing list another example of
> "routine" violation:
> https://cabforum.org/pipermail/servercert-wg/2019-October/001154.html
>
> My concern is that, 7 years later, while I think that compliance has
> marginally improved (largely due to things led by outside the CA ecosystem,
> like CT and ZLint/Certlint), I think the answers/responses/explanations we
> get are still falling into the same predictable buckets, and that concerns
> me, because it's neither sustainable nor healthy for the ecosystem.
>
>
>    - We misinterpreted the requirements. It said X, but we thought it meant
>    Y (Often: even though there's nothing in the text to support Y, that's
> just
>    how we used to do business, and we're CAs so we know more than browsers
>    about what browsers expect from us)
>    - We weren't paying attention to the updates. We've now assigned people
>    to follow updates.
>    - We do X by saying our staff should do X. In this case, they forgot.
>    We've retrained our staff / replaced our staff / added more staff to
>    correct this.
>    - We had a bug. We did not detect the bug because we did not have tests
>    for this. We've added tests.
>    - We weren't sure if X was wrong, but since no one complained, we
>    assumed it was OK.
>    - Our auditor said it was OK
>    - Our vendor said it was OK
>
> and so forth.
>
> And then, in the responses, we generally see:
>
>    - These certificates are used in Very Important Systems, so even though
>    we said we'd comply, we cannot comply.
>    - We don't think X is actually bad. We think X should be OK, and it
>    should be Browsers that reject X if they don't like X (implicit: But
> they
>    should still trust our CA, even though we aren't doing what they want)
>    - Our vendor is not able to develop a fix in time, so we need more time.
>    - We agree that X is bad, and has always been prohibited, but we need
>    more time to actually implement a fix (because we did not
> plan/budget/staff
>    to actually handle issues of non-compliance)
>
> and so forth.
>
> It's tiring and exhausting because we're hearing the same stuff. The same
> patterns that CAs were using when they'd issue MITM certs to companies:
> "Oh, wait, you mean't DON'T issue MITM certs? We didn't realize THAT'S what
> you meant" (recall, this was at least one CA's response when caught issuing
> MITM certs).
>
> I'm exasperated because we're seeing CAs do things like not audit sub-CAs,
> but leaving all the risk to be accepted by browsers, because it's too
> hard/complex to migrate. We're seeing things like CA's not follow policy
> requirements, but then correcting those issues is risky because now they've
> issued a bunch of certs and it's painful to have to replace them all.
>
> If we go back to that classic Dan Geer talk,
> https://cseweb.ucsd.edu/~goguen/courses/275f00/geer.html , every time a CA
> issues a certificate, they've now externalized the risk onto browsers/root
> stores for that certificate lifetime. It's left to the ecosystem to detect
> and clean up the mess, while the CA/subscriber gets the full benefits of
> the issuance. It's a system of incentives that is completely misaligned,
> and we've seen it now for the past decade: The CA benefits from the
> (mis)issuance, and extracts value until it's detected, and then the cost of
> cleanup is placed on the browser/Root Program that expects CAs to actually
> conform. If the Browser doesn't enforce, or consistently enforce, then we
> get back to the "Race to the bottom" that plagued the CA industry, as
> "Requirements" become "Suggestions" or "Nice ideas". Yet if the Browser
> does enforce, they suffer the blame from the Subscriber, who is unhappy
> that the thing they bought no longer works.
>
> In all of this time, it doesn't seem like we're making much progress on
> systemic understanding and prevention. If that's an unfair statement, then
> it means that some CAs are progressing, and some aren't, so how do we help
> the ones that aren't? At what point do we go from education to removal of
> trust? Where is the line when the same set of responses have been used so
> much that it's no longer reasonable? When this ecosystem moves at a snail's
> pace, due to CAs' challenges in updating systems and the long lifetime of
> certificates, the feedback loop is large, and CAs can exploit that
> asymmetry until they're detected. That may sound like I'm ascribing
> intentional malice, when I'm mainly just talking about the perverse
> incentives here that are hindering meaningful improvement.
>
> While I appreciate your suggestion of more transparency, and I'm notably
> all for it, this wouldn't help with, for example, QuoVadis' response to the
> issue. To borrow from Donald Rumsfeld, the set of issues with any single CA
> are, from the browser perspective, the "unknown unknowns". Such a report
> would not tell us, for example, that QuoVadis viewed renewal and issuance
> as separate and independent from requirements. Unless we had all of their
> processes and procedures in front of us, to review the diff, we wouldn't
> spot that there was an "issuance playbook" and a "renewal playbook". Of
> course, there might not have even been a "renewal" playbook until that
> matter came up, so if they created it new, we also wouldn't have detected
> it.
>
> In theory, the incident reports are meant to help the ecosystem improve.
> But if we see egregiously bad incident reports, as I think we have, or
> incident reports that are equivalent to stonewalling for answers by trying
> to give the shortest, least possible information, and we move to take
> sanction on those CAs, we only discourage future incident reporting.
>
> To bring this back, now, to the original topic at hand: What should we be
> doing when requirements are phased in, with years of notice, advanced
> communication, and they're still violated? What should we be doing when
> clear-cut requirements are violated?
>
> I see a few options:
> (a) Accept that what we're doing is not enough, and do something different.
> If so, what would be different, compared to everything that's been tried?
> That was the original gist of the first message.
> (b) Accept that what we're doing is enough, and the CAs that are failing
> are simply not up to the task expected of them, and removing them is the
> only way to correct this. This was the gist of the second message.
> (c) Accept that this system is inherently flawed, and the incentive
> structures misaligned such that this is a natural expectation of any
> complex system. If that's the case, perhaps we should more holistically
> look to replace the system?
>
> This is relevant with the Policy 2.7 update. With all of the effort to
> provide added clarity and improved requirements, do we have reason to
> believe that CAs will adopt and follow it? The past approach is to send a
> CA communication and require affirmative consent. That clearly is not
> working (for some CAs). Suggestions of doing it in the Forum are sometimes
> raised, but that clearly (per the related message) is also failing. So, is
> there something different to try? I like the suggestion of listing
> everything that the CA is changing as part of their operation, although I
> don't think it will prevent these issues (back to "unknown unknowns"). I
> don't have much faith that the auditors will catch these issues, BR or
> otherwise. So... what do we have to make sure Policy 2.7 goes off smoothly?
>
> >
> _______________________________________________
> dev-security-policy mailing list
> dev-security-policy@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to