Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

josh--- via dev-security-policy Thu, 11 Jan 2018 14:29:17 -0800

On Thursday, January 11, 2018 at 3:36:50 PM UTC-6, Ryan Sleevi wrote:
> On Wed, Jan 10, 2018 at 4:33 AM, josh--- via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
> 
> > At approximately 5 p.m. Pacific time on January 9, 2018, we received a
> > report from Frans Rosén of Detectify outlining a method of exploiting some
> > shared hosting infrastructures to obtain certificates for domains he did
> > not control, by making use of the ACME TLS-SNI-01 challenge type. We
> > quickly confirmed the issue and mitigated it by entirely disabling
> > TLS-SNI-01 validation in Let’s Encrypt. We’re grateful to Frans for finding
> > this issue and reporting it to us.
> >
> > We’d like to describe the issue and our plans for possibly re-enabling
> > TLS-SNI-01 support.
> >
> > Problem Summary
> >
> > In the ACME protocol’s TLS-SNI-01 challenge, the ACME server (the CA)
> > validates a domain name by generating a random token and communicating it
> > to the ACME client. The ACME client uses that token to create a self-signed
> > certificate with a specific, invalid hostname (for example,
> > 773c7d.13445a.acme.invalid), and configures the web server on the domain
> > name being validated to serve that certificate. The ACME server then looks
> > up the domain name’s IP address, initiates a TLS connection, and sends the
> > specific .acme.invalid hostname in the SNI extension. If the response is a
> > self-signed certificate containing that hostname, the ACME client is
> > considered to be in control of the domain name, and will be allowed to
> > issue certificates for it.
> >
> > However, Frans noticed that at least two large hosting providers combine
> > two properties that together violate the assumptions behind TLS-SNI:
> >
> > * Many users are hosted on the same IP address, and
> > * Users have the ability to upload certificates for arbitrary names
> > without proving domain control.
> >
> > When both are true of a hosting provider, an attack is possible. Suppose
> > example.com’s DNS is pointed at the same shared hosting IP address as a
> > site controlled by the attacker. The attacker can run an ACME client to get
> > a TLS-SNI-01 challenge, then install their .acme.invalid certificate on the
> > hosting provider. When the ACME server looks up example.com, it will
> > connect to the hosting provider’s IP address and use SNI to request the
> > .acme.invalid hostname. The hosting provider will serve the certificate
> > uploaded by the attacker. The ACME server will then consider the attacker’s
> > ACME client authorized to issue certificates for example.com, and be
> > willing to issue a certificate for example.com even though the attacker
> > doesn’t actually control it.
> >
> > This issue only affects domain names that use hosting providers with the
> > above combination of properties. It is independent of whether the hosting
> > provider itself acts as an ACME client.
> >
> > Our Plans
> >
> > Shortly after the issue was reported, we disabled TLS-SNI-01 in Let’s
> > Encrypt. However, a large number of people and organizations use the
> > TLS-SNI-01 challenge type to get certificates. It’s important that we
> > restore service if possible, though we will only do so if we’re confident
> > that the TLS-SNI-01 challenge type is sufficiently secure.
> >
> > At this time, we believe that the issue can be addressed by having certain
> > services providers implement stronger controls for domains hosted on their
> > infrastructure. We have been in touch with the providers we know to be
> > affected, and mitigations will start being deployed for their systems
> > shortly.
> >
> > Over the next 48 hours we will be building a list of vulnerable providers
> > and their associated IP addresses. Our tentative plan, once the list is
> > completed, is to re-enable the TLS-SNI-01 challenge type with vulnerable
> > providers blocked from using it.
> >
> > We’re also going to be soliciting feedback on our plans from our
> > community, partners and other PKI stakeholders prior to re-enabling the
> > TLS-SNI-01 challenge. There is a lot to consider here and we’re looking
> > forward to feedback.
> >
> > We will post more information and details as our plans progress.
> >
> 
> (Wearing a Google Chrome hat on behalf of our root store policy)
> 
> Josh,
> 
> Thanks for bringing this rapidly to the attention of the broader community
> and proactively reaching out to root programs.
> 
> As framing to the discussion, we still believe TLS-SNI is fully permitted
> by the Baseline Requirements, which, while not ideal, still permits
> issuance using this method. As such, the 'root' cause is that the Baseline
> Requirements permit methods that are less secure than desired, and the
> discussion that follows is now around what steps to take - as CAs, as Root
> Programs, for site operators, and for the CA/Browser Forum.
> 
> When faced with a vulnerable validation method that is permitted, it's
> always a challenge to balance the need for security - for sites and users -
> with the risk of compatibility and breakage from the removal of such a
> method. Fundamentally, the issues you raise call into question the level of
> assurance of 3.2.2.4.9 and 3.2.2.4.10 in the Baseline Requirements, and are
> not limited to TLS-SNI, and potentially affects every CA using these
> methods.
> 
> When evaluating these methods, and their risks, compared to, say, the
> also-weak 3.2.2.4.1 and 3.2.2.4.5 discussions ongoing with the CA/Browser
> Forum, a few key distinctions, although non-exhaustive, apply and are
> factored in to our response and proposal here:
> 
> - The average lifetime of certificates using these methods, across CAs,
> compared to 3.2.2.4.1/3.2.2.4.5, is significantly shorter - very close to
> the 90 days that Let's Encrypt uses, based on the available information we
> have. The fact that so many of these certificates are short lived creates a
> situation in where there's simultaneously more risk to the ecosystem to
> rapidly removing these methods as acceptable (due to the need to
> obtain/renew certificate), while there's also much less risk in allowing
> this method to continue to be used for a limited time, due to the fact that
> certificates that could be obtained by exploiting this will expire much
> sooner than the 2-3 years that many other certificates are issued with.
> That is, the security risk of a bad validation that lives for 3 years is
> much greater than the risk of a bad validation that lives for 90 days, and
> the fact that the badness is only valid for 90 days means that it's easier
> to allow it to more gracefully shut down than potentially accepting that
> implied risk for years.
> 
> - The ease of which alternative methods exist. Methods that are manual are
> substantially easier to remove quickly, as alternative manual processes can
> also be used during the human-to-human interaction, while methods that are
> highly automated conversely create greater challenges, due to the need to
> update client software to whatever new automated methods may be used. While
> 3.2.2.4.1 and 3.2.2.4.5 are highly human-driven methods, methods like
> 3.2.2.4.9 and .10 are designed for automation - and why we were supportive
> of their addition - but also mean that any mitigations will necessarily
> face ecosystem challenges, much like deploying new versions of TLS or
> deprecating old ones.
> 
> - The ease of which alternative automated methods can be used. As automated
> methods are generally designed around integrated systems and certain design
> constraints, it's not always possible to move to an equivalently
> automatable method (as it is with manual methods), and it may be that no
> equivalent automated method exists to fill the design niche. If that design
> niche is a substantial one for clients, and enables otherwise unautomatable
> systems, it can pose greater risk in prematurely removing it. Specific
> applications of the .9 and .10 methods, such as ACME's TLS-SNI, occupy an
> important niche, similar to the 3.2.2.4.6 method and ACME's HTTP-01 method,
> provide a level of automation for systems not directly integrated with DNS,
> and while that means they must be particularly attentive to the security
> risks that come from that, done correctly, they can provide a greater path
> towards security.
> 
> - Compared to 3.2.2.4.1 and 3.2.2.4.5, specific applications of 3.2.2.4.9
> and 3.2.2.4.10 can be evaluated against possible mitigations for the risk,
> both short- and long-term, and steps in which site operators can take to
> affirmatively protect themselves offer better assurances than those that
> rely entirely on the CA's good behaviour. As you call out, the specific
> risks of TLS-SNI are limited to shared providers (not individual users)
> that meet certain conditions, and these shared providers can already take
> existing steps to minimize the immediate risk, such as blocking the use of
> certificates or SNI negotiations that contain the '.invalid' TLD. While
> this is not an ideal long-term solution, by any means, it allows us to
> frame both the immediate and specific risks and the ways to reduce that.
> 
> For the sake of brevity, I'll end my comparisons there, but hopefully
> highlights some of the factors we've considered in our response to your
> proposal.
> 
> Given the risks identified to 3.2.2.4.9 and 3.2.2.4.10, we think it would
> be best for CAs using these Baseline Requirements-acceptable methods of
> validation to begin immediately transitioning away from them, with the goal
> of either removing them entirely from the Baseline Requirements, or
> identifying ways in which .9 and .10 can be better specified to mitigate
> such risks. That said, given the potential risks to the ecosystem,
> particularly those with pre-existing short-lived certificates, we think
> that, provided that the new certificates are valid for 90 days or less,
> we're open to allowing the specific TLS-SNI methods identified by the ACME
> specification to continue to be used for a limited time, while the broader
> community works to identify potential mitigations (if possible) or
> transition away from these methods.
> 
> While we don't think the current status quo represents a viable long-term
> solution, given that the ACME TLS-SNI methods have been broadly reviewed
> within the IETF, that the risks apply to a limited subset of specific
> infrastructures, that mitigations are possible for these infrastructures to
> deploy, that Let's Encrypt is actively working with the community to
> identify, and ideally, share, those that haven't or cannot deploy such
> mitigations, and all of the other items previously mentioned, we think this
> represents an appropriate short-term balance.
> 
> If and as new facts become available, it may be necessary to revisit this.
> We may have overlooked additional risks, or failed to consider mitigating
> factors. Further, this response is contextualized in the application of
> ACME's TLS-SNI methods for validation, and such a response may not be
> appropriate for other forms of validations within the framework of
> 3.2.2.4.9 and 3.2.2.4.10. Similarly, this response doesn't apply to
> certificates that may be valid for longer periods, as they may present
> substantially greater risk to making effective improvements to or an
> orderly transition away from these methods.
> 
> We look forward to working with other browser vendors, site operators, and
> the relying community to work out ways to provide an orderly and effective
> transition to more secure methods - whether that means away from the
> 3.2.2.4.9/.10 series of domain validations, or to more restrictive forms
> that are more clearly "opt-in" rather than the explicit "opt-out" proposed
> (of 'blacklisting .invalid').
> 
> We're also curious if we've overlooked salient details in our response, and
> thus welcome feedback from Let's Encrypt, other CAs utilizing these
> validation methods (both TLS-SNI and 3.2.2.4.9 and 3.2.2.4.10), and the
> broader community as to our proposed next steps. Please consider this a
> draft response, and we look forward to future updates regarding proposed
> next steps.


We have published an update on our plans for TLS-SNI:

https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188

The short summary is that we do not plan to generally re-enable TLS-SNI 
validation, but we will introduce various forms of whitelists to limit impact 
during our transition away from TLS-SNI.

Thanks to everyone for the feedback on this thread already. Let us know if you 
have any questions or concerns.
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

Reply via email to