RE: Possible Issue with Domain Validation Method 9 in a shared hosting environment

Doug Beattie via dev-security-policy Mon, 15 Jan 2018 12:38:10 -0800

Ryan,

I’m not sure where we go from here.  We have customers that need certificates 
and they have demonstrated they can comply with not permitting the creation and 
use of certificates for domains other than those that the hosting company is 
hosting for that customer.  All certificates will continue to be posted to CT 
logs.


As far as the wildcard question, when someone asks for a wildcard cert for a 
domain like *.us.example.com, we validate on that minus the * (so, 
us.example.com in this case).

We’d like to move forward with issuing certificates with controls in place.  If 
there are any other controls you need us to implement to resume issuance, let 
us know.  For example, if we limit validity to 1 year (possibly up to 15 
months) and if we put a firm end date for OneClick for July 1, 2018, would that 
suffice?

Doug


From: Ryan Sleevi [mailto:r...@sleevi.com]
Sent: Monday, January 15, 2018 2:31 PM
To: Doug Beattie <doug.beat...@globalsign.com>
Cc: r...@sleevi.com; Wayne Thayer <wtha...@mozilla.com>; Gervase Markham 
<g...@mozilla.org>; mozilla-dev-security-pol...@lists.mozilla.org
Subject: Re: Possible Issue with Domain Validation Method 9 in a shared hosting 
environment



On Mon, Jan 15, 2018 at 1:18 PM, Doug Beattie 
<doug.beat...@globalsign.com<mailto:doug.beat...@globalsign.com>> wrote:


From: Ryan Sleevi [mailto:r...@sleevi.com<mailto:r...@sleevi.com>]
Sent: Friday, January 12, 2018 5:53 PM
To: Doug Beattie 
<doug.beat...@globalsign.com<mailto:doug.beat...@globalsign.com>>
Cc: Wayne Thayer <wtha...@mozilla.com<mailto:wtha...@mozilla.com>>; Gervase 
Markham <g...@mozilla.org<mailto:g...@mozilla.org>>; 
r...@sleevi.com<mailto:r...@sleevi.com>; 
mozilla-dev-security-pol...@lists.mozilla.org<mailto:mozilla-dev-security-pol...@lists.mozilla.org>
Subject: Re: Possible Issue with Domain Validation Method 9 in a shared hosting 
environment

(Wearing a Google Hat)

Doug,

Thanks for sharing additional details. On the basis of what you've shared so 
far, we do not believe this results in an appropriate level of security for the 
ecosystem, and request that you do not re-enable issuance at this time. This 
applies for any CA using methods similar to what you're using.

Broadly speaking, 
https://groups.google.com/d/msg/mozilla.dev.security.policy/RHsIInIjJA0/HACyY9tMAAAJ
 has shared the some of the principles we've used in this consideration. If 
there is additional details that GlobalSign can share, related to those 
principles, this would be invaluable.
Ryan,

I had a hard time digesting that email because it compared so many different 
items, many of which aren’t directly applicable to the OneClick vs. method 10 
that I want to focus on.  The key points I took away from your email are:

“weak” manual method comparison with methods 9 and 10 (not applicable to the 
methods 9-10 comparison since we’re not comparing them to manual methods).

Short validity certificates represent more risk to ecosystem (expiration) and 
less risk (certs issued under the exploit will expire within 90 days – badness 
lasts for only 90 days).
I’ll address this point below, but given LE will allow renewals of possibly bad 
validations and attackers generally only operate with short periods of attacks 
before moving on, I don’t see the value of short lived certificates having 
meaningful reduction in risk within this context.

Ease of which an alternate method exists and can be used (discussion of manual 
vs. automated methods): Not applicable to the methods 9-10 comparison since 
they are both automated and have the same characteristics.

Risk is applicable to shared service providers and an accepted risk mitigation 
is to block SNI negotiations that contain “.invalid”.  We also propose working 
with our customers on an account by account basis to assure they comply with 
the guidelines for use of method 9 until such time it’s re-affirmed, improved 
or deprecated from the BRs.

Perhaps I missed some other key points from that email.

I think these points may not have been fully appreciated. I don't see evidence 
from this mail, or from the ecosystem, that the OneClick method poses both the 
same risk and the same level of review as ACME's TLS-SNI, and I think we may 
fundamentally disagree about the risk profile of certificates with long 
validity periods, and both the detrimental effect they have on reasoning about 
ecosystem security AND the ways in which they mitigate the need to 'quickly 
re-enable this'


This assessment is based on a number of factors, but includes:
- The validity period of certificates issued via this method means that there 
is an unacceptably large window for certificates improperly issued to be used.
Risk should not be based so heavily on the validity period, which seems to be 
one of your consistent points.  The number of certificates issued along with 
the probability of a failure should both be used in the ecosystem risk 
computation.

We must disagree then. Risk is profoundly dependent on the validity period - 
one of the key mitigations to making an improper evaluation is the knowledge 
that the risk is bounded, whereas greater than 90 days represents significantly 
greater risk.

  Given LE issues orders of magnitude more certificates to unique endpoints, I 
think the risk to the eco system at large with the GlobalSign issuance is lower 
of that with LE (when it comes to the topic of validity periods).

As I tried to explained in our considerations, we believe quite the opposite. 
The large number of ACME (since this is, to be clear, not Let's Encrypt 
specific) endpoints, combined with the shorter-lived validity period, presents 
significant risk to immediately turning it off, while GlobalSign's longer 
period, combined with lesser issuance, reduces that risk of impact to the 
ecosystem from taking the defensibly conservative choice.

Risk = impact x probability:  With the number of LE endpoints (or anyone using 
Method 10 in high volumes), the probability of a successful attack is vastly 
higher due to the sheer number of servers, and the impact for both methods is 
the same (a certificate issued to a successful attacker)

Your assessment of the impact I believe is inverse. An impact of a misissued 
3-year certificate is profoundly worse than that of a 90 day certificate.

This should be rather self-evident, given the ample discussion about 
certificate lifetimes in the Forum, but consider a CA that issued 100 3 year 
certificates versus a CA that issued 1000 30 day certificates. Let's further 
presume that all turn out to be problematic. The size of the CRLs, the impact 
to clients, and the cost to mitigate that are substantially greater for those 3 
year certs, even though there are an order of magnitude fewer of them than the 
30 day certificates. The risk exposure, to the overall ecosystem (e.g. 
including those outside of the browser space) is equally profoundly greater.


- Based on the available information of expiration times and the potential 
difficulty in renewing certificates using this method, the ecosystem risk of 
disallowing this method is much less.
How did you come to the conclusion that validity periods and renewal challenges 
substantially increase the risk of method 9?
1) While a GlobalSign certificate would be valid for a longer period than LE 
(typically 1 year, but up to 3), typical attacks are done, detected, resolved 
within days or weeks  I don’t believe that the validity period of certificates 
significantly  increases the risk when exploited in the way as described (the 
target site would typically notice they were compromised and it would be 
reported and the certified revoked within days or weeks).  A more important 
factor is the number of certificates that may be issued, not their validity 
period.

We disagree on this point, for the reasons explained above. This is a 
long-standing position regarding risk in the ecosystem, as reflected in our 
past discussions regarding validity periods, and thus is entirely consistent 
with past policies and actions.

2) While LE’s validity period is shorter, they re-use the validation for 
subsequent issuance thus the time between validation and expiration is longer 
than 90 days (I believe the domain validations can be cached for 60 days).  
This equates to 5 months vs. generally 12 months for GlobalSign.  And since LE 
will permit domain renewal of possibly bad authentications, the 5 months could 
average out to be substantially higher.
3) While the renewal process is currently not optimal, it’s been working for 5 
years without significant pushback from our customers.  I fail to see how this 
factors into risk in a meaningful way.  I may have missed your point.

To be honest, I'm not quite sure what you mean by "domain renewal" in this 
respect.

However, as explained, when factoring in risks to the ecosystem, we take a 
holistic view about the impact of making a security-negative decision (such as 
improperly allowing something that should be rejected) and a security-positive 
decision (such as rejecting something that might otherwise be acceptable). This 
holistic calculus informed our position here, and while it's clear that we 
disagree with respect to the 'improperly allow' risk (and for which I tried to 
explain why that may be), it's also unclear that there's any new information 
about the impact from taking a security-positive decision here.

- The subtleties regarding Authorization Domain Names means that the risk 
analysis provided is not sufficient - namely, it is unclear, as described, 
whether it is possible to obtain a certificate for 
"www.example.com<http://www.example.com>", on a host that has a customer 
already configured on that domain (and checking/enforcing certificates), by 
first applying for a certificate for "example.com<http://example.com>" as an 
attacker, providing and provisioning a test certificate using that method 
(which is not configured to serve a certificate by the 'victim'), and then 
using that subsequent authorization of the Authorization Domain Name to then 
apply for "www.example.com<http://www.example.com>".
Each and every certificate undergoes its own validation – there is no re-use of 
validation data when issuing subsequent certificates.  I should have stated 
this earlier.  I hope this answers this question.

It doesn't. A reuse of validation is distinct from the validation of the 
Authorization Domain Name. I was highlighting the scoping issues of ADN vs 
FQDN, and your response doesn't seem to address that.

- The potential risk in maintaining this whitelist, given both the statements 
provided by plans to move to deprecate this method post-haste (e.g. no such 
plans) and the validity period of issued certificates (up to 39 months or, 
soon, 825 days).
Since LE can continue to renew certificates issued under this method prior to 
this change, doesn’t that effectively allow longer effective validity periods?  
I recognize there is a difference between renewing and long validity certs, but 
allowing renewal of certs issued under the flawed method seems to reduce value 
of your argument here.

No, it doesn't, because in the event of misissuance, the attacker's ability is 
not the full duration (or 5 months, as you suggest), but bounded by the 
lifetime of the certificate. These are fundamentally different risks - and 
that's why the validity period of the certificate itself is far more important 
than the reuse period of the information. A victim can contact an ACME using CA 
to invalidate the information, thus preventing renewal, and the attacker is 
still bound to the lifetime of the existing certificate.

Compare this with a certificate issued by 1-3 years by GlobalSign, in which 
even if a victim contacts GlobalSign, the most that GlobalSign can do is to 
revoke that certificate, which is ineffective at scale. This permits the 
attacker a far greater 'attack' window, even though GS might have revoked it, 
and is a key and fundamental difference.

- The lack of preexisting review and documentation of the specific protocol 
being employed
The process was discussed as part of the BR update and it’s documented in the 
BRs, and I hope the supplemental information provided here helps.  Also, the 
OneClick plugins and intended use is not susceptible to this attack because the 
plugin does not permit upload/use of certificates for domains other than the 
one the plug-in is managing (yes, clients can change the code or develop their 
own, so the protocol is susceptible in the end).

Right, I'm focusing on the protocol, and I don't believe that the protocol 
implemented by OneClick is equivalent to what's described in 3.2.2.4.9. I 
believe it is compatible with/an implementation of, but just as 3.2.2.4.10 does 
not describe ACME TLS-SNI (nor does 3.2.2.4.6 describe ACME HTTP-01), despite 
being compatible, we're treating this based on the available information of and 
public review and discussion of the specific protocol.

As this thread shows, we anticipate we will continue to find variants of risk, 
and thus the whitelist approach, combined with the validity periods caused by 
the risk (both of issued certificates and "completed validations"), poses a 
long-term risk, even if we catch issues 'within days'.
Validity period discussions above hopefully helps to mitigate the risk of 
OneClick as compared with method 10 @ LE

As I hope you can see, the validity period conversation again highlights that 
we believe it presents significant risk to the ecosystem to allow such 
certificates to be valid for greater than 90 days, at the most. This applies to 
any user of ACME's TLS-SNI, and more than likely, any (existing) validation 
method under the 3.2.2.4.9 / 3.2.2.4.10 aegis.

Doug’s summary:

-          Implement mitigations to prevent exploitations can be coordinated 
with our major customers to reduce or eliminate the risk (similar to LE)

We don't believe the proposal meaningfully does so, for the reasons outlined.


-          GlobalSign performs domain validation for every issuance, vs caching 
validation info.
The issue is not with revalidation at issuance, but about the permitted use of 
the Authorization Domain Name.


-          Total active OneClick certificates <100K

-          Total number of active OneClick customers: < 10
This highlights that the risk (to not allowing issuance) is potentially small, 
and can meaningfully be addressed by engaging with those 10 customers.

As you no doubt know, our goal is to ensure a consistent response, and we see 
significant danger in posing 'whitelisting' as a solution, as it fundamentally 
devolves into a question about how the CA manages and maintains that whitelist, 
which exists outside the remit of the BRs. This is where steps, such as those 
proposed by Let's Encrypt, to move to disable and/or prevent the usage of 
TLS-SNI (particularly in new deployments), poses a way to both mitigate the 
immediate risk and to move towards deprecation overall.

GlobalSign's in a unique position, given the small number of deployments, to be 
able to work to devise technical solutions that may mitigate the risk, and 
effective deployment may be a better solution for the ecosystem both medium- 
and long-term. Inevitably, however, validity periods must play a part as any 
short-term mitigation (in addition to exploring technical solutions), so that 
we can arrive at something that is consistent across CAs and consistent among 
risk factors.

-          Risk to Ecosystem with GlobalSign is lower based on active 
certificates (100K vs. tens of millions).  Perhaps the more important factor is 
not active certificates, but certificates issued which increases the difference 
between GlobalSign and LE.
Agreed that the risk to disallowing issuance is lower. I believe you were 
trying to highlight something different, but hopefully the responses above 
highlight why we may disagree that the GlobalSign solution is less risky.

-          Time between domain validation and certificate expiration is 12 
months (typical GlobalSign cert) vs. 5-8 or more months with LE (since domains 
can be renewed with method 10 without mitigations, even if prior and current 
validation methods are vulnerable)
As noted above, the time period noted here is incorrect in its assessment.


_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

RE: Possible Issue with Domain Validation Method 9 in a shared hosting environment

Reply via email to