Re: Incident report D-TRUST: syntax error in one tls certificate

Jakob Bohm via dev-security-policy Wed, 05 Dec 2018 13:20:23 -0800

On 05/12/2018 01:05, Nick Lamb wrote:
> On Tue, 4 Dec 2018 14:55:47 +0100
> Jakob Bohm via dev-security-policy
> <dev-security-policy@lists.mozilla.org> wrote:
> 
>> Oh, so you meant "CA issuance systems and protocols with explicit
>> automation features" (as opposed to e.g. web server systems or
>> operating systems or site specific subscriber automation systems).
>> That's why I asked.
> 
> Yes. These systems exist, have existed for some time, and indeed now
> appear to make up a majority of all issuance.
>


I didn't doubt that automation systems exist, I was thoroughly confused 
when, a few messages back you wrote a reference to "these systems" 
without stating which systems.

>> And note that this situation started with an OV certificate, not a DV
>> certificate.  So more than domain ownership needs to be validated.
> 
> Fortunately it is neither necessary nor usual to insist upon fresh
> validations for Organisational details for each issuance. Cached
> validations can be re-used for a period specified in the BRs although
> in some cases a CA might chose tighter constraints.
> 

However an OV or EV issuance often involve substantially different 
choices for domain validation and especially for validating the CSR-to-
subscriber-identity relationship than the choices made for robotic DV 
issuance systems, even when the organizational identity validation is 
cached.  For example, I know of at least one CA where the process 
involves a subscriber representative signing a paper form with a 
printout of the CSR (as one of multiple steps).

>> You have shown that ONE system, which you happen to like, can avoid
>> that weakness, IF you ignore some other issues.  You have not shown
>> that requiring subscribers to do this for any and all combinations of
>> validation systems and TLS server systems they encounter won't have
>> this weakness.
> 
> Yes, an existence proof. Subscribers must of course choose trade-offs
> that they're comfortable with. That might mean accepting that your web
> site could become unavailable for a period of several days at short
> notice, or that you can't safely keep running Microsoft IIS 6.0 even
> though you'd prefer not to upgrade. What I want to make clear is that
> offering automation without write access to the private key is not only
> theoretically conceivable, it's actually easy enough that a bunch of
> third party clients do it today because it was simpler than whatever
> else they considered.

Existence proof is good for refuting a claim that something doesn't 
exist.  It does nothing to prove that it is the only good thing.

Nothing I wrote has any relationship to Microsoft software specifics 
(except for my brief reply to your own aside about another Microsoft 
technology).

You have yet to point out any non-ACME client that organizations can 
use to automate the renewal and replacement of OV and EV certificates 
without write access to the private key, thus I can not validate your 
claims that there are "a bunch of third party clients" doing that.
You have only made some claims about what would be theoretically 
possible for the ACME HTTP-01 protocol.

(You mention cPanel below, more there).

> 
>> I made no such claim.  I was saying that your hypothetical that
>> all/most validation systems have the properties of ACME and that
>> all/most TLS servers allow certificate replacement without access to
>> the private key storage represents an idealized scenario different
>> from practical reality.
> 
> Subscribers must choose for themselves, in particular it does not
> constitute an excuse as to why they need more time to react. Choices
> have consequences, if you choose a process you know can't be done in a
> timely fashion, it won't be done in a timely fashion and you'll go
> off-line.

The choice of validation protocol is one made by the CA, subscribers 
have little influence except where a CA happens to offer more than 
one validation method or where multiple CAs are otherwise equal in 
terms of the subscribers selection criteria.

Outside of the pressure this community makes on CAs, there is very 
little reason why subscribers should expect that CAs suddenly revoke 
their certificate for entirely CA-internal reasons.  Therefore it is 
unreasonable to expect the general population of site owning 
organizations to plan on the basis that this is a risk worth 
planning for.

> 
>> And the paragraph I quoted says to not do that unless you are using a
>> HSM, which very few subscribers do.
> 
> It says it only recommends doing this for a _renewal_ if you have an
> HSM. But a scheduled _renewal_ already provides sufficient notice for
> you to replace keys and make a fresh CSR at your leisure if you so
> choose. Which is why you were talking about unscheduled events.
> 
> If you have a different reference which says what you originally
> claimed, I await it.
> 

Now you are going off on a huge tangent about the detailed specifics 
of that particular document and its choice of words.  The document was 
arbitrarily chosen as the first one I could dig up mentioning this long 
standing general practice of "one cert=one key".

As a paying subscriber at other CAs, I would expect a CA-forced sudden 
reissue to at least include a complimentary extension of validity, as 
compensation for the sudden loss of service availability (I am talking 
about the availability of the CA service, not the availability of the 
TLS service that relies on the CA).  This would often mean that the 
replacement cert would have a validity beyond the end of the original 
cert, thus justifying the need to give it a new key for crypto-period 
reasons alone.

>> It is not a convenience of scheduling.  It is a security best
>> practice, called out (as the first example found) in that particular
>> NIST document.
> 
> If that was indeed their claimed security best practice the NIST
> document would say you must replace keys every time you replace
> certificates, for which it would need some sort of justification, and
> there isn't one. But it doesn't - it recommends you _renew_ once per
> year‡, and that you should change keys when you _renew_, which is to
> say, once per year.
> 
> ‡ Technically this document is written to be copy-pasted into a three
> ring binder for an organisation, so you can just write in some other
> amount of time instead of <one year or less>. As with other documents of
> this sort it will not achieve anything on its own.
> 
>> Which has absolutely no bearing on the rule that keys stored outside
>> an HSM should (as a best practice) be changed on every reissue.  It
>> would be contradictory if part B says not to reuse keys, and part C
>> then prescribes an automation method violating that.
> 
> There is no such rule listed in that NIST document. The rule you've
> cited talks about renewals, but a reissue is not a renewal. There was
> nothing wrong with the expiry date for the certificate, that's not why
> it was replaced.
> 
> There are however several recommendations which contradict this idea
> that it's OK to have processes which take weeks to act, such as:
> 
> "System owners MUST maintain the ability to replace all certificates on
> their systems within <2> days to respond to security incidents"
> 
> "Private keys, and the associated certificates, that have the
> capability of being directly accessed by an administrator MUST be
> replaced within <30> days of reassignment or <5> days of termination of
> that administrator"
> 
> 
> The NIST document also makes many other recommendations that - like the
> one year limit - won't be followed by most real organisations; such as a
> requirement to add CAA records, to revoke all their old certificates
> a short time after they're replaced, the insistence on automation for
> adding keys to "SSL inspection" type capabilities or the prohibition of
> all wildcards.
> 

(Here you snipped a change of subject)

>> So it is real.
> 
> Oh yes, doing things that are a bad idea is very real. That is, after
> all, why we're discussing this at all.

No, we are discussing if it is reasonable to expect regular organizations 
to handle CA-initiated sudden revocations either by having a 24/7/365 
security staff with the ability and authority to handle this or by having 
a robotic script that can handle such events via a (yet to be defined) 
CA-to-subscriber notification protocol.

One of my arguments for saying it is unreasonable to expect regular 
organizations (not big CAs) to have that ability is that whatever handles 
the request at the subscriber end (whether a robot or a human) will in 
many practical cases need privileged access to the private key, which is 
something that should not be granted to extraneous 4th shift techs or 
Internet-launchable customized scripts.

Systems that need the certificates to be input in PKCS#12 form is one 
example of systems where a certificate cannot be replaced without access 
to the private key, even if (as you keep wanting) the certificate would 
be issued for the same keypair as the old certificate.

> 
>> - For systems that want the certificate as a PKCS#12 file only,
>>    certificate import requires private key import and thus private key
>>    write access.
> 
> Yup. This is a bad design. It's come up before. It's not our place to
> tell programmers they can't do this, but it's certainly within our
> remit (or indeed NIST's) to remind users that software designed this
> way doesn't help them achieve their security goals. It can go on that
> big heap of NIST recommendations actual users will ignore.
> 

Other than the weakness of some historic PKCS#12 implementations (limited 
to 40 bit keys!), using PKCS#12 files as the software equivalent of a 
crypto ignition key is not fundamentally flawed.  Especially if there is 
a desire to generate the private key using a dedicated key generation 
facility (such as the ones alluded to in various NIST documents).

One way to use PKCS#12 key+cert installation in a high security manner is 
to have the key-generation facility put the PKCS#12 file on a removable 
medium, transport that medium in a sealed container to the server facility, 
then having a two-man team install the PKCS#12 file, with one person having 
the medium and the other knowing the random password, then securely 
destroying the medium.  Neither person is allowed to copy the medium, and 
the password plus file never coexist outside the target server and key 
generation facility.

>> - For systems that append the private key pem file to the certificate
>>    chain PEM file, certificate import requires write access to the file
>>    storing the private key.
> 
> This is also bad design but it's pretty trivial to "mask out" in a
> wrapper of the software. I'm sure there are programs where this is
> mandatory but in the ones I've seen it's usually an option rather than
> the only way to provide certificates.
> 

And the (non-)security of such a wrapper implementation was part of my 
initial argument.

Anyway, keeping key+cert chain in a single file provides the desirable 
property that normal cert replacement (planned renewal with fresh key) 
can be done atomically with a single "mv -f new.ext current.ext" on a 
running system (except the tiny window of file non-existence during the 
operation on many POSIX systems).

>> - For systems that put the key and certificate chain in parallel PEM
>>    files (such as Apache HTTPD), granting write access to the
>> certificate but not the private key is potentially possible, though
>> not necessarily.  For example, typical Apache HTTPD configurations
>> place the certificate and key files in the same POSIX disk directory,
>> where write access would be granted to the operator installing new
>>    certificates.
> 
> Directory permissions might be one of the POSIX features most likely to
> be misunderstood by people (as distinct from them knowing they don't
> understand it). The operator writing to a certificate file does NOT need
> write permission for a directory that certificate is in, such permission
> would let them change the directory, which isn't what they need to do.
> That operator only needs permission to write to the certificate file.

Yes, this could be done, if everything was designed around this rare 
scenario rather than normal operations and system emergencies.  Normal 
certificate operations more commonly involve adding additional 
certificates for additional domain names than they involve replacing 
certificates at external request.

Something like

-rwxrwx--- root   www  4096 Feb 29 2017 .
-rw-r----- robot  www 12345 Feb 29 2018 certchain.pem
-rw-r----- keygen www  3272 Nov 31 2017 certchain.key

With the webserver somehow dropping dir access after loading keys, 
despite already not running as root.

> 
> More over, in a truly automated system we should distinguish between
> the permissions granted to the system and that fraction available to a
> human user of the system. It is entirely possible that an automated
> system which is technically permitted to write to a private key file is
> not, in fact, designed to do so and does not do so, so that its user
> cannot cause this to happen as a result of using the system.

My criticism of automated systems was about the risk that such a system 
contained a security bug whereby an outside attacker could cause the 
system to do something other than intended.

> 
>> This assumes that granting a big global cloud provider easy access to
>> your organization's private keys is considered an acceptable risk,
>> which is not at all a given.
> 
> It may not be. Of course whether using cloud services in fact gives
> them "easy access to your organisation's private keys" is a matter of
> some debate, you will certainly find representatives of the major cloud
> service providers happy to explain why they think their systems offer
> better safeguards against malfeasance than whatever home-brew system
> your organisation has itself.

Marketing != Truth.

> 
> One of the nice effects in automation at scale is that you can resist
> the temptation to do things manually since it becomes necessarily more
> work than automating them. This results in a situation where, say, an
> AWS engineer isn't allowed to log into a customer's virtual machine and
> tinker with their private keys NOT just out of a sense of the importance
> of customer privacy but because doing so will never scale across the
> platform. Any engineer who wants to do this is Bad at their job, even if
> they aren't in fact a privacy-invading snoop, so there's no reason to
> make it possible and every reason to detect them and fire them.

This is a property of the cost-cutting measures of AWS.  There are entire 
companies founded on providing engineers who do log on to customer's 
AWS-hosted VMs as a value-adding service.

And anyway, one fear with global cloud companies is that data might be 
stolen or mangled via automation at scale, perhaps at the request of 
foreign governments (remember the certificate in question was issued 
to a government facility).

> 
> Symantec was never able to wean itself off the practice of manually
> issuing certificates, even after years of problems caused by exactly
> that approach. In contrast as I understand it ISRG / Let's Encrypt
> obtained even their Mozilla-required test leaf certificates by...
> actually requesting them through their automated issuance system just
> like an end user.
> 

Manually issuing certificates at a high volume CA is unrelated to 
manually authorizing certificate requests at organizations with a 
low number of certificates.

>> And there you assume that automation is the norm.  Which I am arguing
>> it is not.
> 
> Well there's the thing. In terms of volume it is. That sort of thing
> will sneak up on you with automation.
> 
> The largest CA by far in volume terms is ISRG's Let's Encrypt which of
> course only issues with ACME. The second largest is probably Comodo /
> Sectigo which issues a huge volume for cPanel (an automation solution)
> and Cloudflare (also automated). Some fraction of the certs at second
> tier CAs like DigiCert are automated but I would not hazard a guess at
> how many.
> 

Automation can produce a lot of noise, overwhelming statistics that 
consider it equal to non-automation.

And this is the first time that you mention that cPanel has an automation 
interface to Sectigo.  I've never really looked at that software, but I 
now wonder if it has the other properties that you assume an automation 
system should have:

 - Ability to replace OV/EV certificates at short notice without having 
  to wake up the site owner and convince them you are not a tech-support 
  scammer.

 - Inability of everyone (including the site owner) to overwrite the 
  private key via an Internet-exposed interface.

Looking at documentation.cpanel.net I see little sign of these abilities.


Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded 
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Re: Incident report D-TRUST: syntax error in one tls certificate

Reply via email to