Ryan, thank you for your comment. The answers to your questions below:

On 04.12.2018 19:13, Ryan Sleevi wrote:
Thanks for filing this, Wojciech. This is definitely one of the better
incident reports in terms of providing details and structure, while also
speaking to the steps the CA has taken in response. There was sufficient
detail here that I don't have a lot of questions - if anything, it sounds
like a number of best practices that all CAs should abide by result. The
few questions I do have are inline below:

On Mon, Dec 3, 2018 at 6:06 AM Wojciech Trapczyński via dev-security-policy
<dev-security-policy@lists.mozilla.org> wrote:

(All times in UTC±00:00)


10.11.2018 10:10 – We received a notification from our internal
monitoring system for issuing certificates and CRLs concerning issues
with publishing CRLs. We started verification.


Understanding what system you had in place before hand is valuable in
understanding what changes you propose to make. In particular, in
remediation, you note "We have deployed additional verification of
certificate and CRL signatures in the external component"

It's unclear here what the monitoring system monitored, or what the
challenges in publishing were. It sounds like there was already monitoring
in place in the internal system that detected the issue with corrupted
signatures. Is that a misunderstanding? I could also see an interpretation
being that "It was having trouble publishing large files", which would seem
a different issue.

Thus, it's helpful if you could discuss a bit more about what this
monitoring system already monitors, and how you're improving it to catch
this sort of issue. This may reveal other possible gaps, or it may be so
comprehensive as to also serve as a best practice that all CAs should
follow. In either event, the community wins :)



There are two things here: how we monitor our infrastructure and how our software operates.

Our system for issuing and managing certificates and CRLs has module responsible for monitor any issue which may occur during generating certificate or CRL. The main task of this module is to inform us that "something went wrong" during the process of issuing certificate or CRL. In this case we have got notification that several CRLs had not been published. This monitoring did not inform us about corrupted signature in one CRL. It only indicated that there are some problems with CRLs. To identify the source of the problem human action was required.

Additionally, we have the main monitoring system with thousands of tests of the whole infrastructure. For example, in the case of CRLs we have tests like check HTTP status code, check downloading time, check NextUpdate date and others. After the incident we have added tests which allow us to quickly detect CRLs published with invalid signature (we are using simple OpenSSL based script).

The sentence "We have deployed additional verification of certificate and CRL signatures in the external component" applies to the changes we have made in the software. After the incident we have added verification of signature of the certificates and CRLs. These improvements have been added to the software that works independently of the signing module which was the source of the problem.

As I described in the incident report we also have improved the part of the signing module responsible for verification of signature, because at the time of failure it did not work properly.

6. Explanation about how and why the mistakes were made or bugs
introduced, and how they avoided detection until now.

<snip>

All issued certificates were unusable due to corrupted signature.


Could you speak to more about how you assessed this? An incorrect signature
on the CRL would not necessarily prevent the certificate from being used;
it may merely prevent it from being revoked. That is, all 30,000 (revoked)
certificates may have been usable due to the corrupted signature.



Kurt has explained it well. Kurt, thank you.

7. List of steps your CA is taking to resolve the situation and ensure
such issuance will not be repeated in the future, accompanied with a
timeline of when your CA expects to accomplish these things.

We have deployed a new version of the signing module that correctly
signs large CRLs. From now, we are able to sign a CRL that is up to 128
MB. In addition, we have improved the part of the signing module
responsible for verification of signatures (at the time of failure it
did not work properly).

We have deployed additional verification of certificate and CRL
signatures in the external component, in addition to the signing module.
This module blocks the issuance of certificates and CRLs that have an
corrupted signature.

We have extended the monitoring system tests that will allow us to
faster detection of incorrectly signed certificates or CRLs.


As others have highlighted, there is still an operational gap, in that 1MB
CRLs are rather large and unwieldy. To help manage this, CRLs support
"sharding", by virtue of the CRL distribution point URL and the (critical)
CRL extension of Issuing Distribution Point (
https://tools.ietf.org/html/rfc5280#section-5.2.5 ). For example, the same
(Subject DN + key) intermediate CA can divide the certificates it issues
into an arbitrary number of CRLs. It does this by ensuring distinct URLs in
the certificates' CRLDP extension, and then, for each of the URLs
referenced, hosting a CRL for all certificates bearing that URL, and with a
critical IDP extension in the CRL (ensuring the IDP is present and critical
is a critical security function).

By doing this, you can roll a new CRL for every X number of subscriber
certificates you've issued, allowing you to bound the worst-case
revocation. For example, if the average size of your CRL entry was 32 bytes
(easier for the math), then every 2,000 certificates, you could create a
new CRL URL, and the maximum size your CRL would be (in the worst case) for
those 2,000 certificates is 64K.

Have you considered such an option? Several other CAs already apply this
practice, at varying degrees of scale and size, but it seems like it would
be a further mitigation to a root cause, which is that the revocation of
30,000 certificates would not balloon things so much.


Thank you for pointed that out. We have not considered it yet, but it seems to be a good solution for such cases. We would have to estimate what changes would be required to implement this.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to