Re: Incident report Certum CA: Corrupted certificates

2018-12-10 Thread Wojciech Trapczyński via dev-security-policy

On 05.12.2018 21:26, Ryan Sleevi wrote:

On Wed, Dec 5, 2018 at 7:53 AM Wojciech Trapczyński
wrote:


Ryan, thank you for your comment. The answers to your questions below:


Again, thank you for filing a good post-mortem.

I want to call out a number of positive things here rather explicitly, so
that it hopefully can serve as a future illustration from CAs:
* The timestamp included the times, as requested and required, which help
provide a picture as to how responsive the CA is
* It includes the details about the steps the CA actively took during the
investigation (e.g. within 1 hour, 50 minutes, the initial cause had been
identified)
* It demonstrates an approach that triages (10.11.2018 12:00), mitigates
(10.11.2018 18:00), and then further investigates (11.11.2018 07:30) the
holistic system. Short-term steps are taken (11.11.2018 19:30), followed by
longer term steps (19.11.2018)
* It provides rather detailed data about the problem, how the problem was
triggered, the scope of the impact, why it was possible, and what steps are
being taken.

That said, I can't say positive things without highlighting opportunities
for improvement:
* It appears you were aware of the issue beginning on 10.11.2018, but the
notification to the community was not until 03.12.2018 - that's a
significant gap. I see Wayne already raised it in
https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c1  and that has been
responded to inhttps://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2
* It appears, based on that bug and related discussion (
https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2  ), that from
10.11.2018 01:05 (UTC±00:00) and 14.10.2018 07:35 (UTC±00:00) an invalid
CRL was being served. That seems relevant for the timeline, as it speaks to
the period of CRL non-compliance. In this regard, I think we're talking
about two different BR "violations" that share the same incident root cause
- a set of invalid certificates being published and a set of invalid CRLs
being published. Of these two, the latter is far more impactful than the
former, but it's unclear based on the report if the report was being made
for the former (certificates) rather than the latter (CRLs)

Beyond that, a few selected remarks below.



There are two things here: how we monitor our infrastructure and how our
software operates.

Our system for issuing and managing certificates and CRLs has module
responsible for monitor any issue which may occur during generating
certificate or CRL. The main task of this module is to inform us that
"something went wrong" during the process of issuing certificate or CRL.
In this case we have got notification that several CRLs had not been
published. This monitoring did not inform us about corrupted signature
in one CRL. It only indicated that there are some problems with CRLs. To
identify the source of the problem human action was required.


Based on your timeline, it appears the issue was introduced at 10.11.2018
01:05 and not alerted on until 10.11.2018 10:10. Is that correct? If so,
can you speak to why the delay between the issue and notification, and what
the target delay is with the improvements you're making? Understanding that
alerting is finding a balance between signal and noise, it does seem like a
rather large gap. It may be that this gap is reflective of 'on-call' or
'business hours', it may be a threshold in the number of failures, it may
have been some other cause, etc. Understanding a bit more can help here.




Yes, that is correct. This monitoring system that we are using in our 
software for issuing and managing certificates and CRLs has not 
notification feature. The requirement of reviewing events from it is a 
part of the procedure. In the other words, to detect any issue in this 
monitoring the human action is required. That is why we detected this 
issue with some delay.


Therefore, we have added tests to our main monitoring system and we 
receive notification in less than 5 minutes since the occurrence of the 
event.



Additionally, we have the main monitoring system with thousands of tests
of the whole infrastructure. For example, in the case of CRLs we have
tests like check HTTP status code, check downloading time, check
NextUpdate date and others. After the incident we have added tests which
allow us to quickly detect CRLs published with invalid signature (we are
using simple OpenSSL based script).


So, this is an example of a good response. It includes a statement that
requires trust ("we have ... thousands of tests"), but then provides
examples that demonstrate an understanding and awareness of the potential
issues.

Separate from the incident report, I think publishing or providing details
about these tests could be a huge benefit to the community, with an ideal
outcome of codifying them all as requirements that ALL CAs should perform.
This is where we go from "minimum required" to "best practice", and it
sounds like y'all are operating at a level that seeks to capture the 

Re: Incident report Certum CA: Corrupted certificates

2018-12-05 Thread Ryan Sleevi via dev-security-policy
On Wed, Dec 5, 2018 at 7:53 AM Wojciech Trapczyński 
wrote:

> Ryan, thank you for your comment. The answers to your questions below:
>

Again, thank you for filing a good post-mortem.

I want to call out a number of positive things here rather explicitly, so
that it hopefully can serve as a future illustration from CAs:
* The timestamp included the times, as requested and required, which help
provide a picture as to how responsive the CA is
* It includes the details about the steps the CA actively took during the
investigation (e.g. within 1 hour, 50 minutes, the initial cause had been
identified)
* It demonstrates an approach that triages (10.11.2018 12:00), mitigates
(10.11.2018 18:00), and then further investigates (11.11.2018 07:30) the
holistic system. Short-term steps are taken (11.11.2018 19:30), followed by
longer term steps (19.11.2018)
* It provides rather detailed data about the problem, how the problem was
triggered, the scope of the impact, why it was possible, and what steps are
being taken.

That said, I can't say positive things without highlighting opportunities
for improvement:
* It appears you were aware of the issue beginning on 10.11.2018, but the
notification to the community was not until 03.12.2018 - that's a
significant gap. I see Wayne already raised it in
https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c1 and that has been
responded to in https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2
* It appears, based on that bug and related discussion (
https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2 ), that from
10.11.2018 01:05 (UTC±00:00) and 14.10.2018 07:35 (UTC±00:00) an invalid
CRL was being served. That seems relevant for the timeline, as it speaks to
the period of CRL non-compliance. In this regard, I think we're talking
about two different BR "violations" that share the same incident root cause
- a set of invalid certificates being published and a set of invalid CRLs
being published. Of these two, the latter is far more impactful than the
former, but it's unclear based on the report if the report was being made
for the former (certificates) rather than the latter (CRLs)

Beyond that, a few selected remarks below.


> There are two things here: how we monitor our infrastructure and how our
> software operates.
>
> Our system for issuing and managing certificates and CRLs has module
> responsible for monitor any issue which may occur during generating
> certificate or CRL. The main task of this module is to inform us that
> "something went wrong" during the process of issuing certificate or CRL.
> In this case we have got notification that several CRLs had not been
> published. This monitoring did not inform us about corrupted signature
> in one CRL. It only indicated that there are some problems with CRLs. To
> identify the source of the problem human action was required.
>

Based on your timeline, it appears the issue was introduced at 10.11.2018
01:05 and not alerted on until 10.11.2018 10:10. Is that correct? If so,
can you speak to why the delay between the issue and notification, and what
the target delay is with the improvements you're making? Understanding that
alerting is finding a balance between signal and noise, it does seem like a
rather large gap. It may be that this gap is reflective of 'on-call' or
'business hours', it may be a threshold in the number of failures, it may
have been some other cause, etc. Understanding a bit more can help here.


> Additionally, we have the main monitoring system with thousands of tests
> of the whole infrastructure. For example, in the case of CRLs we have
> tests like check HTTP status code, check downloading time, check
> NextUpdate date and others. After the incident we have added tests which
> allow us to quickly detect CRLs published with invalid signature (we are
> using simple OpenSSL based script).
>

So, this is an example of a good response. It includes a statement that
requires trust ("we have ... thousands of tests"), but then provides
examples that demonstrate an understanding and awareness of the potential
issues.

Separate from the incident report, I think publishing or providing details
about these tests could be a huge benefit to the community, with an ideal
outcome of codifying them all as requirements that ALL CAs should perform.
This is where we go from "minimum required" to "best practice", and it
sounds like y'all are operating at a level that seeks to capture the spirit
and intent, and not just the letter, and that's the kind of ideal
requirement to codify and capture.


> As I described in the incident report we also have improved the part of
> the signing module responsible for verification of signature, because at
> the time of failure it did not work properly.
>

This is an area where I think more detail could help. Understanding what
caused it to "not work properly" seems useful in understanding the issues
and how to mitigate. For example, it could be that "it did not work
properly" 

Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Ryan Sleevi via dev-security-policy
On Tue, Dec 4, 2018 at 2:08 PM Kurt Roeckx  wrote:

> He explained before that the module that generated the corrupt
> signature for the CRL was in a weird state after that and all
> the newly issued certificates signed by that module also had
> corrupt signatures.
>

Ah! Thanks, I misparsed that. I agree, it does seem to be clearly addressed
:)
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Kurt Roeckx via dev-security-policy
On Tue, Dec 04, 2018 at 01:14:44PM -0500, Ryan Sleevi via dev-security-policy 
wrote:
> 
> > All issued certificates were unusable due to corrupted signature.
> >
> 
> Could you speak to more about how you assessed this? An incorrect signature
> on the CRL would not necessarily prevent the certificate from being used;
> it may merely prevent it from being revoked. That is, all 30,000 (revoked)
> certificates may have been usable due to the corrupted signature.

He explained before that the module that generated the corrupt
signature for the CRL was in a weird state after that and all
the newly issued certificates signed by that module also had
corrupt signatures.


Kurt

___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Ryan Sleevi via dev-security-policy
>
> Thanks for filing this, Wojciech. This is definitely one of the better
incident reports in terms of providing details and structure, while also
speaking to the steps the CA has taken in response. There was sufficient
detail here that I don't have a lot of questions - if anything, it sounds
like a number of best practices that all CAs should abide by result. The
few questions I do have are inline below:

>
On Mon, Dec 3, 2018 at 6:06 AM Wojciech Trapczyński via dev-security-policy
 wrote:

> (All times in UTC±00:00)
>
>>
> 10.11.2018 10:10 – We received a notification from our internal
>
>> monitoring system for issuing certificates and CRLs concerning issues
> with publishing CRLs. We started verification.
>

Understanding what system you had in place before hand is valuable in
understanding what changes you propose to make. In particular, in
remediation, you note "We have deployed additional verification of
certificate and CRL signatures in the external component"

It's unclear here what the monitoring system monitored, or what the
challenges in publishing were. It sounds like there was already monitoring
in place in the internal system that detected the issue with corrupted
signatures. Is that a misunderstanding? I could also see an interpretation
being that "It was having trouble publishing large files", which would seem
a different issue.

Thus, it's helpful if you could discuss a bit more about what this
monitoring system already monitors, and how you're improving it to catch
this sort of issue. This may reveal other possible gaps, or it may be so
comprehensive as to also serve as a best practice that all CAs should
follow. In either event, the community wins :)


> 6. Explanation about how and why the mistakes were made or bugs
> introduced, and how they avoided detection until now.
>


> All issued certificates were unusable due to corrupted signature.
>

Could you speak to more about how you assessed this? An incorrect signature
on the CRL would not necessarily prevent the certificate from being used;
it may merely prevent it from being revoked. That is, all 30,000 (revoked)
certificates may have been usable due to the corrupted signature.


> 7. List of steps your CA is taking to resolve the situation and ensure
>
>> such issuance will not be repeated in the future, accompanied with a
>
>> timeline of when your CA expects to accomplish these things.
>
>>
> We have deployed a new version of the signing module that correctly
>
>> signs large CRLs. From now, we are able to sign a CRL that is up to 128
>
>> MB. In addition, we have improved the part of the signing module
>
>> responsible for verification of signatures (at the time of failure it
>
>> did not work properly).
>
>>
> We have deployed additional verification of certificate and CRL
>
>> signatures in the external component, in addition to the signing module.
>
>> This module blocks the issuance of certificates and CRLs that have an
>
>> corrupted signature.
>
>>
> We have extended the monitoring system tests that will allow us to
> faster detection of incorrectly signed certificates or CRLs.
>

As others have highlighted, there is still an operational gap, in that 1MB
CRLs are rather large and unwieldy. To help manage this, CRLs support
"sharding", by virtue of the CRL distribution point URL and the (critical)
CRL extension of Issuing Distribution Point (
https://tools.ietf.org/html/rfc5280#section-5.2.5 ). For example, the same
(Subject DN + key) intermediate CA can divide the certificates it issues
into an arbitrary number of CRLs. It does this by ensuring distinct URLs in
the certificates' CRLDP extension, and then, for each of the URLs
referenced, hosting a CRL for all certificates bearing that URL, and with a
critical IDP extension in the CRL (ensuring the IDP is present and critical
is a critical security function).

By doing this, you can roll a new CRL for every X number of subscriber
certificates you've issued, allowing you to bound the worst-case
revocation. For example, if the average size of your CRL entry was 32 bytes
(easier for the math), then every 2,000 certificates, you could create a
new CRL URL, and the maximum size your CRL would be (in the worst case) for
those 2,000 certificates is 64K.

Have you considered such an option? Several other CAs already apply this
practice, at varying degrees of scale and size, but it seems like it would
be a further mitigation to a root cause, which is that the revocation of
30,000 certificates would not balloon things so much.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Wojciech Trapczyński via dev-security-policy

On 04.12.2018 15:16, Kurt Roeckx via dev-security-policy wrote:
I think you misunderstood my question. I think you should never serve an 
invalid file. I think it's better to have a file that is 1 or 2 days old 
then it is to have an invalid file. So you could check that it's a valid 
file before you start serving it, and if it's invalid keep the old file.
As I mentioned in the incident report, we have deployed additional 
verification of certificate and CRL signatures in the external 
component, in addition to the signing module. This module blocks the 
issuance of certificates and CRLs that have an invalid signature.




smime.p7s
Description: S/MIME Cryptographic Signature
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Kurt Roeckx via dev-security-policy

On 2018-12-04 10:25, Wojciech Trapczyński wrote:

On 04.12.2018 10:01, Kurt Roeckx via dev-security-policy wrote:

On 2018-12-04 7:24, Wojciech Trapczyński wrote:

Question 1: Was there a period during which this issuing CA had no
   validly signed non-expired CRL due to this incident?



Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00) 
we were serving one CRL with corrupted signature.


Do you have any plans to prevent serving CRLs with an invalid 
signature and keep the old CRL in place until you have a valid one?


This one CRL with corrupted signature was serving between dates I 
mentioned. Starting from November 14th 07:35 (UTC±00:00) we are serving 
CRL with a valid signature. I have described it in the Bugzilla bug 
(https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2).


I think you misunderstood my question. I think you should never serve an 
invalid file. I think it's better to have a file that is 1 or 2 days old 
then it is to have an invalid file. So you could check that it's a valid 
file before you start serving it, and if it's invalid keep the old file.



Kurt
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Wojciech Trapczyński via dev-security-policy

On 04.12.2018 10:01, Kurt Roeckx via dev-security-policy wrote:

On 2018-12-04 7:24, Wojciech Trapczyński wrote:

Question 1: Was there a period during which this issuing CA had no
   validly signed non-expired CRL due to this incident?



Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00) 
we were serving one CRL with corrupted signature.


Do you have any plans to prevent serving CRLs with an invalid signature 
and keep the old CRL in place until you have a valid one?


This one CRL with corrupted signature was serving between dates I 
mentioned. Starting from November 14th 07:35 (UTC±00:00) we are serving 
CRL with a valid signature. I have described it in the Bugzilla bug 
(https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2).




smime.p7s
Description: S/MIME Cryptographic Signature
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-04 Thread Kurt Roeckx via dev-security-policy

On 2018-12-04 7:24, Wojciech Trapczyński wrote:

Question 1: Was there a period during which this issuing CA had no
   validly signed non-expired CRL due to this incident?



Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00) we 
were serving one CRL with corrupted signature.


Do you have any plans to prevent serving CRLs with an invalid signature 
and keep the old CRL in place until you have a valid one?



Kurt
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report Certum CA: Corrupted certificates

2018-12-03 Thread Wojciech Trapczyński via dev-security-policy

Thank you. The answers to your questions below.

On 04.12.2018 00:47, Jakob Bohm via dev-security-policy wrote:

On 03/12/2018 12:06, Wojciech Trapczyński wrote:

Please find our incident report below.

This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1511459.

---

1. How your CA first became aware of the problem (e.g. via a problem
report submitted to your Problem Reporting Mechanism, a discussion in
mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit),
and the time and date.

10.11.2018 10:10 UTC + 0 – We received a notification from our internal
monitoring system concerning issues with publishing CRLs.

2. A timeline of the actions your CA took in response. A timeline is a
date-and-time-stamped sequence of all relevant events. This may include
events before the incident was reported, such as when a particular
requirement became applicable, or a document changed, or a bug was
introduced, or an audit was done.

(All times in UTC±00:00)

10.11.2018 10:10 – We received a notification from our internal
monitoring system for issuing certificates and CRLs concerning issues
with publishing CRLs. We started verification.
10.11.2018 12:00 – We established that one of about 50 CRLs has
corrupted digital signature value. We noticed that this CRL has a much
larger size that others. We verified that in short period of time over
30 000 certificates had been added to this CRL.
10.11.2018 15:30 – We confirmed that the signing module has a trouble
with signing CRL greater than 1 MB. We started working on it.
10.11.2018 18:00 – We disabled the automatic publication of this CRL. We
verified that others CRLs have correct signature.
11.11.2018 07:30 – As part of the post-failure verification procedure,
we started the inspection of whole system including all certificates
issued at that time.
11.11.2018 10:00 – We verified that some parts of issued certificates
have corrupted digital signature.
11.11.2018 10:40 – We established that one from a few working in
parallel signing modules was producing corrupted signatures. We turned
it off.
11.11.2018 18:00 – We confirmed that the reason for the corrupted
signature of certificates was a large CRL which prevented further
correct operation of that signing module.
11.11.2018 19:30 – We left only one working signing module which prevent
further mis-issuances.
19.11.2018 11:00 – We deployed on production an additional digital
signature verification in external module, out of the signing module.
19.11.2018 21:00 – We deployed on production a new version of the
signing module which correctly handle a large CRL.



Question 1: Was there a period during which this issuing CA had no
   validly signed non-expired CRL due to this incident?



Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00) we 
were serving one CRL with corrupted signature.



Question 2: How long were ordinary revocations (via CRL) delayed by
   this incident?



There was no delay in ordinary revocations. All CRLs were generating and 
publishing in accordance with CABF BR.



Question 3: Was Certum's OCSP handling for any issuing or root CA affected
   by this incident (for example, were any OCSP responses incorrectly
   signed?, were OCSP servers not responding?  were OCSP servers returning
   outdated revocation data until the large-CRL signing was operational on
   2018-11-19 21:00 UTC ?)



No, OCSP was not impacted. We were serving correct OCSP responses all 
the time.



3. Whether your CA has stopped, or has not yet stopped, issuing
certificates with the problem. A statement that you have will be
considered a pledge to the community; a statement that you have not
requires an explanation.

11.11.2018 17:47

4. A summary of the problematic certificates. For each problem: number
of certs, and the date the first and last certs with that problem were
issued.

355.

The first one: 10.11.2018 01:26:10
The last one: 11.11.2018 17:47:36

All certificates were revoked.

5. The complete certificate data for the problematic certificates. The
recommended way to provide this is to ensure each certificate is logged
to CT and then list the fingerprints or crt.sh IDs, either in the report
or as an attached spreadsheet, with one list per distinct problem.

Full list of certificates in attachment.

6. Explanation about how and why the mistakes were made or bugs
introduced, and how they avoided detection until now.

The main reason for the corrupted operation of the signing module was
the lack of proper handling of a large CRL, greater than 1 MB. At the
moment when the signing module received such a large list for signing it
was not able to sign it correctly. In addition, the signing module
started to incorrectly sign the remaining objects received for signing
later, i.e. after receiving a large CRL for signature.

Due to the fact that at the time when problem occurred we were using
simultaneously several signing modules, the problem did not affect all
certificates issued at that 

Re: Incident report Certum CA: Corrupted certificates

2018-12-03 Thread Jakob Bohm via dev-security-policy
On 03/12/2018 12:06, Wojciech Trapczyński wrote:
> Please find our incident report below.
> 
> This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1511459.
> 
> ---
> 
> 1. How your CA first became aware of the problem (e.g. via a problem 
> report submitted to your Problem Reporting Mechanism, a discussion in 
> mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), 
> and the time and date.
> 
> 10.11.2018 10:10 UTC + 0 – We received a notification from our internal 
> monitoring system concerning issues with publishing CRLs.
> 
> 2. A timeline of the actions your CA took in response. A timeline is a 
> date-and-time-stamped sequence of all relevant events. This may include 
> events before the incident was reported, such as when a particular 
> requirement became applicable, or a document changed, or a bug was 
> introduced, or an audit was done.
> 
> (All times in UTC±00:00)
> 
> 10.11.2018 10:10 – We received a notification from our internal 
> monitoring system for issuing certificates and CRLs concerning issues 
> with publishing CRLs. We started verification.
> 10.11.2018 12:00 – We established that one of about 50 CRLs has 
> corrupted digital signature value. We noticed that this CRL has a much 
> larger size that others. We verified that in short period of time over 
> 30 000 certificates had been added to this CRL.
> 10.11.2018 15:30 – We confirmed that the signing module has a trouble 
> with signing CRL greater than 1 MB. We started working on it.
> 10.11.2018 18:00 – We disabled the automatic publication of this CRL. We 
> verified that others CRLs have correct signature.
> 11.11.2018 07:30 – As part of the post-failure verification procedure, 
> we started the inspection of whole system including all certificates 
> issued at that time.
> 11.11.2018 10:00 – We verified that some parts of issued certificates 
> have corrupted digital signature.
> 11.11.2018 10:40 – We established that one from a few working in 
> parallel signing modules was producing corrupted signatures. We turned 
> it off.
> 11.11.2018 18:00 – We confirmed that the reason for the corrupted 
> signature of certificates was a large CRL which prevented further 
> correct operation of that signing module.
> 11.11.2018 19:30 – We left only one working signing module which prevent 
> further mis-issuances.
> 19.11.2018 11:00 – We deployed on production an additional digital 
> signature verification in external module, out of the signing module.
> 19.11.2018 21:00 – We deployed on production a new version of the 
> signing module which correctly handle a large CRL.
> 

Question 1: Was there a period during which this issuing CA had no 
  validly signed non-expired CRL due to this incident?

Question 2: How long were ordinary revocations (via CRL) delayed by 
  this incident?

Question 3: Was Certum's OCSP handling for any issuing or root CA affected 
  by this incident (for example, were any OCSP responses incorrectly 
  signed?, were OCSP servers not responding?  were OCSP servers returning 
  outdated revocation data until the large-CRL signing was operational on 
  2018-11-19 21:00 UTC ?)

> 3. Whether your CA has stopped, or has not yet stopped, issuing 
> certificates with the problem. A statement that you have will be 
> considered a pledge to the community; a statement that you have not 
> requires an explanation.
> 
> 11.11.2018 17:47
> 
> 4. A summary of the problematic certificates. For each problem: number 
> of certs, and the date the first and last certs with that problem were 
> issued.
> 
> 355.
> 
> The first one: 10.11.2018 01:26:10
> The last one: 11.11.2018 17:47:36
> 
> All certificates were revoked.
> 
> 5. The complete certificate data for the problematic certificates. The 
> recommended way to provide this is to ensure each certificate is logged 
> to CT and then list the fingerprints or crt.sh IDs, either in the report 
> or as an attached spreadsheet, with one list per distinct problem.
> 
> Full list of certificates in attachment.
> 
> 6. Explanation about how and why the mistakes were made or bugs 
> introduced, and how they avoided detection until now.
> 
> The main reason for the corrupted operation of the signing module was 
> the lack of proper handling of a large CRL, greater than 1 MB. At the 
> moment when the signing module received such a large list for signing it 
> was not able to sign it correctly. In addition, the signing module 
> started to incorrectly sign the remaining objects received for signing 
> later, i.e. after receiving a large CRL for signature.
> 
> Due to the fact that at the time when problem occurred we were using 
> simultaneously several signing modules, the problem did not affect all 
> certificates issued at that time. Our analysis shows that the problem 
> affected about 10% of all certificates issued at that time.
> 
> We have been using this signing module for a few last years and at the 
> time of its implementation the tests did not