On Thu, Dec 27, 2018 at 10:00 PM Jeremy Rowley <jeremy.row...@digicert.com>
wrote:

> The risk Matt identified is too nebulous of an issue to address, tbh. How
> do you address a moral issue?  The only way I can think of to address the
> moral issue is to say “we promise to be good”. But the weight that carries
> depends on how much you trust the actor. If you trust the actor, then the
> moral issue is addressed. If you don’t trust the actor, moral issue is not
> addressed. If you or Matt can identify a specific threat you’d like me to
> address about the moral issue, I’ll do my best to respond.
>

I think Matt provided a pretty clear moral hazard here - of customers
suggesting their CAs didn't do enough (e.g. should have tried harder to
intentionally violated by not revoking). One significant way to mitigating
that risk is to take meaningful steps to ensure that "We couldn't revoke"
is not really a viable or defensible option.


>
>    - What happens is that you ask why there is risk of outage to begin
>    with and what can be done to improve going forward? Let’s assume you do
>    revoke, and it causes an outage - is DigiCert taking steps to ensure no
>    customer of theirs is ever faced with that risk? If so, what are those
>    steps?
>
>
>
> Yeah – there are several things we can do to improve going forward:
>
>    1. Communicate better with the customers. The first mistake was
>    waiting until we had good data to communicate with the customers. This
>    delayed notification. This was unknown to me at the time, or we would have
>    sent out communication prior to the ballot passing. That instruction has
>    been passed along (no waiting on these critical issues) plus training.
>    2. No more skipping CAB Forum meetings for me. This was easily a
>    foreseeable issue because we knew people couldn’t replace in January. I
>    think it’s been brought up a half dozen times in the forum at least. I’m
>    not sure why we didn’t communicate this in Shanghai. But, the real problem
>    is I didn’t have direct knowledge of what was going on. I probably need to
>    be there in person each time so we can align the company correctly with
>    that is going on.
>
> That... doesn't really inspire confidence. If the answer for how to deal
with this is block efforts to remediate issues, then it runs all the risk
that Matt was speaking to. "We knew people couldn't replace in January" is
a problem, for sure, but because fundamentally the risk is always there
that someone would need to revoke in January - or December, or November, or
whenever the sensitive holiday freeze or critical sales or lunar alignment
or personal vacation is - it's not really a mitigation at all for the issue.

I tried to give suggestions earlier for meaningful steps - such as making
sure all customers know that certificates may need to be revoked as soon as
24 hours. This has been a pattern of challenge in the past for DigiCert if
I recall correctly - I believe both Blizzard and GitHub had issues where
the keys were compromised, but these organizations didn't want to revoke
the certs until they could ship new private keys in their software (...
ignoring all the issues in that one). I know you've said you've got the
contracts in place to defensibly revoke these, but how are you helping your
users understand these risks? Do you have documentation on this? Do you
recommend users use automation? I know some of this speaks to business
practice, but I think that's somewhat core to the issue - since revocation
may be required, how is the CA, the party best placed to communicate to the
customer, communicating that necessity?

As Matt spoke to it somewhat, there's understandably competitive advantage
to being the CA that will try their hardest not to revoke. And while I
don't think this has risen to that level based on the information provided
so far, understanding how that perception is being mitigated is key. There
are other solutions, to be sure. Helping users move from publicly trusted
CAs to managed CAs, for example, can still meet the business needs of these
users w/o the attendant revocation risk.

Things like Heartbleed have shown that rapid revocation can be necessary.
Misissuance or misvalidation by the CA that results in revocation surely
can as well. Understandably, an answer of "Don't ever misissue" is great,
but if it's really pinning all the hopes on one thing. Other CAs have taken
steps like ensuring automation and short-lived certs as a way of ensuring
that the upper-bound of any issue is limited (for example, to 90 days, or
six months), and that automation is the default way of getting certs.


>
>    - And this is the framing that I think is incredibly helpful.
>    Understanding why customers can’t change, and what steps are being done to
>    ensure they can, is hugely useful. Wayne’s question were to this point - as
>    were mine towards understanding the problem from the other side, which are
>    steps the CA is taking. As I've repeatedly highlighted from
>    https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation
>    
> <https://clicktime.symantec.com/a/1/POL8RNB4yU_cADRiUe8z_jZxcu7WZVlAWTjJn6i5wJ8=?d=XJSnlPqK_clDa0y3O_aI0Omud25oQO1nXZfVoZzyW7v60aEmzGejkX33mqgy4Xq-WZCrw6DAzqwv4bmdtcEScpto7fIvILek0CEHTlo-j3WW1__7iDt1avHSwo2-V_oAsGH3Tcl3Li84FPhYEQUpokqiXGgnPImBXezLt6mJIChpWSRD9XoexQRH2MLKerdwOCu073mw8_0Wj9mG1Z9yhsV4BZvuwsArzLVt4RRtiKoCcuYtJNd01U_H7ItfnorOHQbsxe1F_KK7Mkq_im2tiUKP-TVzGJ-ujYJeuRwbPUxdTFwbnxpi2dC85FZ7b4rVKZqKWSXyuzXUJm69N7JWRpwnDsYYTXYyDlk_r-1aSVxNjTFLllKzYqtvbJMWUKkeLZ2rMeye&u=https%3A%2F%2Fwiki.mozilla.org%2FCA%2FResponding_To_An_Incident%23Revocation>
>    , the goal is not punishment - but understanding how these issues are being
>    addressed.
>
>
>
> The main blocker for all of these is policy, not technology. I don’t know
> how to solve third party policy decisions, which is why I can’t seem to
> answer the questions. The process of planning a change, getting sign-off,
> rolling the change to stage, getting more sign-off, and then rolling to
> production with final testing combined with the blackout periods is making
> something that should be easy very difficult. I run an agile team at
> DigiCert so none of these are concerns when we roll a change internally.
> It’s the revocation part that is getting people up in arms. The consistent
> message I’ve gotten from customers is that changing domains and
> certificates requires the same process. It’s just as fast to roll out a
> change to both items as change just a certificate. The built-in CAB Forum
> 30 day cert requirement isn’t solving the issue because of the way they
> roll changes, not because the 30 day certs aren’t available.
>

So, concrete suggestions then, since it sounds like you're asking for that.

1) Communication to all your customers about the industry-standard
revocation requirements
2) Clear promotion, documentation, and tools for automation
3) Clear and published policies about the critical nature of certificates
and how they should be regarded

None of these are necessarily unique to DigiCert - #2 gets close, but there
are options. When your customer comes to you and says "We have a holiday
freeze", doesn't it seem better placed to say "Look, beyond just signing a
subscriber agreement, we sent messages on dates X, Y, and Z around the
industry standard practices around revocation. We also provided solutions
A, B, and C, which were all declined by your team."

I know that sounds like "shifting blame", but since you're absolutely
correct that you can't prevent your customers from engaging in risky
behaviours, the best you can do is to make sure that it's clear to them
that it is risky, it is unsupported, and it's not DigiCert being mean, but
industry standard. There's an opportunity to take this incident and make
sure no DigiCert customer ever experiences this issue again - or those of
any other CA.

That's I think one way to mitigate the moral hazard Matt speaks to. When
the next customer of the next CA comes and says "Look, you need to try to
get us an exception" - the CA knows that if they didn't do all of those
steps, they really didn't take any of (these) lessons to heart, and it's
not really defensible. And, if they did take those steps, then hasn't the
expectation been shifted to the customer - and the risk - thus making it
easier for the CA to defensibly say "You did this to yourself?"

Totally agree. I really don’t want to violate the BRs, and this shouldn’t
> be the norm. I also recognize we don’t want to invite this question for
> every BR change. Maybe better Mozilla guidelines about what’s acceptable
> requests and what’s not?
>

I can't speak for Mozilla here, but I tried to lay out some clear
expectations:
1) This is an extension of an existing incident, rather than treating it as
an exception to some long-standing or new rule
2) This is being treated as part of the remediation (revocation) plan,
rather than as an intentional violation of some other requirement
3) Going forward, "they weren't prepared for revocation" is not really an
acceptable answer in and of itself, and for this particular incident,
concrete proposals for how "They weren't prepared for revocation" can be
addressed or mitigated go a long way to addressing the underlying root
cause here, and by proxy, demonstrate a healthy awareness of and balancing
of risk, and ways to concretely mitigate that for the future.
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to