Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Ryan Sleevi via dev-security-policy
On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> 1. Having a spare certificate ready (if done with proper security, e.g.
>a separate key) from a different CA may unfortunately conflict with
>badly thought out parts of various certificate "pinning" standards.
>

You blame the standards, but that seems an operational risk that the site
(knowingly) took. That doesn't make a compelling argument.


> 2. Being critical from a society perspective (e.g. being the contact
>point for a service to help protect the planet), doesn't mean that the
>people running such a service can be expected to be IT superstars
>capable of dealing with complex IT issues such as unscheduled
>certificate replacement due to no fault of their own.
>

That sounds like an operational risk the site (knowingly) took. Solutions
for automation exist, as do concepts such as "hiring multiple people"
(having a NOC/SOC). I see nothing to argue that a single person is somehow
the risk here.


> 3. Not every site can be expected to have the 24/7 staff on hand to do
>"top security credentials required" changes, for example a high-
>security end site may have a rule that two senior officials need to
>sign off on any change in cryptographic keys and certificates, while a
>limited-staff end-site may have to schedule a visit from their outside
>security consultant to perform the certificate replacement.
>

This is exactly describing a known risk that the site took, accepting the
tradeoffs. I fail to see a compelling argument that there should be no
tradeoffs - given the harm presented to the ecosystem - and if sites want
to make such policies, rather than promoting automation and CI/CD, then it
seems that's a risk they should bear and make an informed choice.

Thus I would be all for an official BR ballot to clarify/introduce
> that 24 hour revocation for non-compliance doesn't apply to non-
> dangerous technical violations.
>

As discussed elsewhere, there is no such thing as "non-dangerous technical
violations". It is a construct, much like "clean coal", that has an
appealing turn of phrase, but without the evidence to support it.


> Another category that would justify a longer CA response time would be a
> situation where a large batch of certificates need to be revalidated due
> to a weakness in validation procedures (such as finding out that a
> validation method had a vulnerability, but not knowing which if any of
> the validated identities were actually fake).  For example to recheck a
> typical domain-control method, a CA would have to ask each certificate
> holder to respond to a fresh challenge (lots of manual work by end
> sites), then do the actual check (automated).


Like the other examples, this is not at all compelling. Solutions exist to
mitigate this risk entirely. CAs and their Subscribers that choose not to
avail themselves of these methods - for whatever the reason - are making an
informed market choice about these. If they're not informed, that's on the
CAs. If they are making the choice, that's on the Subscribers.

There's zero reason to change, especially when such revalidation can be,
and is, being done automatically.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Ryan Sleevi via dev-security-policy
On Mon, Nov 26, 2018 at 10:31 AM Nick Lamb via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> CA/B is the right place for CAs to make the case for a general rule about
> giving themselves more time to handle technical non-compliances whose
> correct resolution will annoy customers but impose little or no risk to
> relying parties,
>

CAs have made the case - it was not accepted.

On a more fundamental and philosophical level, I think this is
well-intentioned but misguided. Let's consider that the issue is one that
the CA had the full power-and-ability to prevent - namely, they violated
the requirements and misissued. A CA is only in this situation if they are
a bad CA - a good CA will never run the risk of "annoying" the customer.

This also presumes that "annoyance" of the subscriber is a bad thing - but
this is also wrong. If we accept that CAs are differentiated based on
security, then a CA that regularly misissues and annoys its customers is a
CA that will lose customers. This is, arguably, better than the
alternative, which is to remove trust in a CA entirely, which will annoy
all of its customers.

This presumes that the customer cannot take steps to avoid this. However,
as suggested by others, the customer could have minimized or eliminated
annoyance, such as by ensuring they have a robust system to automate the
issuance/replacement of certificates. That they didn't is an operational
failure on their fault.

This presumes that there is "little or no risk to relying parties."
Unfortunately, they are by design not a stakeholder in those conversations
- the stakeholders are the CA and the Subscriber, both of which are
incentivized to do nothing (it avoids annoying the customer for the CA, it
avoids having to change for the customer). This creates the tragedy of the
commons that we absolutely saw result from browsers not regularly enforcing
compliance on CAs - areas of technical non-compliance that prevented
developing interoperable solutions from the spec, which required all sorts
of hacks, which then subsequently introduced security issues. This is not a
'broken windows' argument so much as a statement of the demonstrable
reality we lived in prior to Amazon's development and publication of
linting tools that simplified compliance and enforcement, and the
subsequent improvements by ZLint.

Conceptually, this is similar to an ISP that regularly cuts its own
backbone cables or publishes bad routes. By ensuring that the system
consistently functions as designs - and that the CA follows their own
stated practices and procedures and revokes everything that doesn't - the
disruption is entirely self-inflicted and avoidable, and the market can be
left to correct for that.


> I personally at least would much rather see CAs actually formally agree
> they should all have say 28 days in such cases - even though that's surely
> far longer than it should be - than a series of increasingly implausible
> "important" but ultimately purely self-serving undocumented exceptions that
> make the rules on paper worthless.
>

I disagree that encouraging regulatory capture (and the CA/Browser Forum
doesn't work by formal agreement of CAs, nor does it alter root program
expectations) is the solution here.

I agree that it's entirely worthless the increasingly implausible
"important" revocations. I think a real and meaningful solution is what is
being more consistently pursued, and that's to distrust CAs that are not
adhering to the set of expectations. There's no reason to believe the
"impact" argument, particularly when it's one that both the Subscriber and
the CA can and should have avoided, and CAs that continue to make that
argument are increasingly showing that they're not working in the best
interests of Relying Parties (see above) or Subscribers (by "annoying" them
or lying to them), and that's worthy of distrust.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Late Certinomis Audit (Was: Audit Reminder Email Summary)

2018-11-26 Thread Wayne Thayer via dev-security-policy
Update: I heard back from Certinomis quickly. They provided the following
attestation statement from LSTI dated 23-November on the same day. The
audit was conducted back in July, so we still need an explanation from
Certinomis of why it took LSTI so long to provide the report.

https://bugzilla.mozilla.org/attachment.cgi?id=9027230

Unfortunately, the audit period listed in the report begins a week after
the prior audit period ended. Certinomis says that this is a reporting
mistake, so I have asked them to provide an updated attestation statement
from LSTI.

- Wayne

On Tue, Nov 20, 2018 at 5:00 PM Wayne Thayer  wrote:

> Thanks for pointing this out Kurt. The Certinomis / Docapost audit report
> is now almost one month late. Also, last week the Certinomis representative
> informed root programs that he was leaving his post and two others would be
> taking his place. I have just emailed the two new representatives and asked
> them to explain when we will see the audit report. I'm also concerned about
> their numerous compliance bugs.
>
> - Wayne
>
> On Tue, Nov 20, 2018 at 3:15 PM Kurt Roeckx via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
>
>> On Tue, Oct 23, 2018 at 02:35:37PM -0700, Kathleen Wilson via
>> dev-security-policy wrote:
>> > > > Mozilla: Audit Reminder
>> > > > Root Certificates:
>> > > > Certinomis - Root CA
>> > > > Standard Audit:
>> > > > https://bug937589.bmoattachments.org/attachment.cgi?id=8898169
>> > > > Audit Statement Date: 2017-07-24
>> > > > BR Audit:
>> https://bug937589.bmoattachments.org/attachment.cgi?id=8898169
>> > > > BR Audit Statement Date: 2017-07-24
>> > > > CA Comments: null
>> > >
>> > > This seems to be in French, and does not seem to even indicate
>> > > when the audit was done, just that the report itself is valid for
>> > > 2 years.
>> >
>> > Our official requirement for the audit statements to be in English is
>> new in
>> > version 2.6 of our policy (effective date July 1, 2018). Also, last
>> July we
>> > were still having difficulty getting the ETSI auditors on board with
>> > specifying audit periods in their audit statements.
>>
>> So it seems nothing changed related to this in the last month,
>> they are clearly late in providing a new audit statement.
>>
>>
>> Kurt
>>
>>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Jakob Bohm via dev-security-policy
On 23/11/2018 16:24, Enrico Entschew wrote:
> This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1509512
> 
> syntax error in one tls certificate
> 
> 1. How your CA first became aware of the problem (e.g. via a problem report 
> submitted to your Problem Reporting Mechanism, a discussion in 
> mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the 
> time and date.
> 
> We became aware of the issue via https://crt.sh/ on 2018-11-12, 09:01 UTC.
> 
> 2. A timeline of the actions your CA took in response. A timeline is a 
> date-and-time-stamped sequence of all relevant events. This may include 
> events before the incident was reported, such as when a particular 
> requirement became applicable, or a document changed, or a bug was 
> introduced, or an audit was done.
> 
> Timeline:
> 2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error 
> in one tls certificate issued on 2018-06-02.  The PrintableString of OBJECT 
> IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more 
> details see https://crt.sh/?id=514472818
> 2018-11-12, 09:30 UTC CA Security Issues task force analyzed the error and 
> recommended further procedure.
> 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an 
> international critical trade platform for emissions. Immediate revocation of 
> the certificate would cause irreparable harm to the public.
> 2018-11-12, 13:00 UTC We performed  a dedicated  additionally coaching on 
> this specific syntax topic within the validation team to avoid this kind of 
> error in the future.
> 2018-11-16, 08:40 UTC Customer responded first time and asked for more time 
> to evaluate the certificate replacement process.
> 2018-11-19, 12:30 UTC CA informed the auditor TÜV-IT about the issue.
> 2018-11-20, 15:19 UTC Customer declared to replace the certificate on 
> 2018-11-22 latest.
> 2018-11-22, 15:52 UTC New certificate has been applied for and has been 
> issued.
> 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 
> 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.
> 
> 3. Whether your CA has stopped, or has not yet stopped, issuing certificates 
> with the problem. A statement that you have will be considered a pledge to 
> the community; a statement that you have not requires an explanation.
> 
> The CA has not stopped issuing EV-certificates. We applied dedicated coaching 
> on this specific syntax topic within the validation team to avoid this kind 
> of error until software adjustments to both effected systems have been 
> completed.
> 
> 4. A summary of the problematic certificates. For each problem: number of 
> certs, and the date the first and last certs with that problem were issued.
> 
> 1 Certificate
> SHA-256 41F3AD0CBDA392F078D776FD1CDC0E35F7AF61030C56C7B26B95936F41A83B32
> Issued on 2018-06-01
> 
> 5. The complete certificate data for the problematic certificates. The 
> recommended way to provide this is to ensure each certificate is logged to CT 
> and then list the fingerprints or crt.sh IDs, either in the report or as an 
> attached spreadsheet, with one list per distinct problem.
> 
> For more details see https://crt.sh/?id=514472818
> 
> 6. Explanation about how and why the mistakes were made or bugs introduced, 
> and how they avoided detection until now.
> 
> This problem was caused within the frontend system to the customer and the 
> lint system. Both systems did not check the entry in the field of 
> serialNumber (2 5 4 5) correctly. It was possible to enter characters other 
> than defined in PrintableString definition.
> 
> 7. List of steps your CA is taking to resolve the situation and ensure such 
> issuance will not be repeated in the future, accompanied with a timeline of 
> when your CA expects to accomplish these things.
> 
> The CA Security Issues task force together with the software development 
> analyzed the error. We applied dedicated coaching on this specific syntax 
> topic within the validation team to avoid this kind of error until software 
> adjustments to both effected systems have been completed.  The changes in the 
> systems are expected to go live in early January 2019.
> 

In addition to this, would you add the following:

- Daily checks of crt.sh (or some other existing tool) if 
 additional such certificates are erroneously issued before 
 the automated countermeasures are in place?

- Procedurally (and eventually technically) restrict the serial number 
 element to actual validated identification numbers from a fixed set of 
 databases for each jurisdiction.  For example for a Bundesamt, this 
 should be a special prefix followed by some kind of official 
 identifying number of entities within the Bundesvervaltung.  Similar of 
 cause for Landesamts, companies etc.
  Also, it is unclear why a Bundesamt belongs to an identification 
 jurisdiction lower than the entire BDR.
  For comparison, Danish Company entities 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Jakob Bohm via dev-security-policy

On 26/11/2018 16:31, Nick Lamb wrote:
In common with others who've responded to this report I am very 
skeptical about the contrast between the supposed importance of this 
customer's systems versus their, frankly, lackadaisical technical response.


This might all seem harmless but it ends up as "the boy who cried wolf". 
If you relay laughable claims from customers several times, when it 
comes to an incident where maybe some extraordinary delay was 
justifiable any good will is already used up by the prior claims.


CA/B is the right place for CAs to make the case for a general rule 
about giving themselves more time to handle technical non-compliances 
whose correct resolution will annoy customers but impose little or no 
risk to relying parties, I personally at least would much rather see CAs 
actually formally agree they should all have say 28 days in such cases - 
even though that's surely far longer than it should be - than a series 
of increasingly implausible "important" but ultimately purely 
self-serving undocumented exceptions that make the rules on paper worthless.


It should be noted that the counter-measures that some posts have
expected of the end-site in question may not always be realistic
(Speaking generally, as I have not data on the specifics of this end-
site):

1. Having a spare certificate ready (if done with proper security, e.g.
  a separate key) from a different CA may unfortunately conflict with
  badly thought out parts of various certificate "pinning" standards.

2. Being critical from a society perspective (e.g. being the contact
  point for a service to help protect the planet), doesn't mean that the
  people running such a service can be expected to be IT superstars
  capable of dealing with complex IT issues such as unscheduled
  certificate replacement due to no fault of their own.

3. Not every site can be expected to have the 24/7 staff on hand to do
  "top security credentials required" changes, for example a high-
  security end site may have a rule that two senior officials need to
  sign off on any change in cryptographic keys and certificates, while a
  limited-staff end-site may have to schedule a visit from their outside
  security consultant to perform the certificate replacement.

Thus I would be all for an official BR ballot to clarify/introduce
that 24 hour revocation for non-compliance doesn't apply to non-
dangerous technical violations.

Another category that would justify a longer CA response time would be a
situation where a large batch of certificates need to be revalidated due
to a weakness in validation procedures (such as finding out that a
validation method had a vulnerability, but not knowing which if any of
the validated identities were actually fake).  For example to recheck a
typical domain-control method, a CA would have to ask each certificate
holder to respond to a fresh challenge (lots of manual work by end
sites), then do the actual check (automated).



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Nick Lamb via dev-security-policy
In common with others who've responded to this report I am very skeptical about the contrast between the supposed importance of this customer's systems versus their, frankly, lackadaisical technical response.This might all seem harmless but it ends up as "the boy who cried wolf". If you relay laughable claims from customers several times, when it comes to an incident where maybe some extraordinary delay was justifiable any good will is already used up by the prior claims.CA/B is the right place for CAs to make the case for a general rule about giving themselves more time to handle technical non-compliances whose correct resolution will annoy customers but impose little or no risk to relying parties, I personally at least would much rather see CAs actually formally agree they should all have say 28 days in such cases - even though that's surely far longer than it should be - than a series of increasingly implausible "important" but ultimately purely self-serving undocumented exceptions that make the rules on paper worthless.___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Gijs Kruitbosch via dev-security-policy

(for the avoidance of doubt: posting in a personal capacity)

On 23/11/2018 15:24, Enrico Entschew wrote:

Timeline:
2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an 
international critical trade platform for emissions. Immediate revocation of 
the certificate would cause irreparable harm to the public.



2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 
a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.


Some questions I have:

1) Don't the BR specify CAs MUST revoke within 24 hours (for some 
issues) or 5 days (for others)? This looks like just over 10 days, and 
was customer-prompted as opposed to set by the CA, it seems. Am I just 
missing the part of the BRs that says ignoring the 5 days is OK if it's 
"just" a syntax error?


2) what procedure does D-TRUST follow to ensure adequate revocation 
times, and in particular, under what circumstances does it decide that 
not revoking until the customer gives an OK is necessary (e.g. how does 
it decide what constitutes an "international[ly] critical" site)? Is 
this documented, e.g. in CPS or similar? Have auditors signed off on that?


3) can you elaborate on the system being down causing "irreparable 
harm"? What would have happened if the cert had just been revoked after 
24/120 hours? In this case, the website in question ( www.dehst.de ) has 
been broken in Firefox for the past 64 or so hours (ie since about 6pm 
UK time on Friday, when I first read your message) because the server 
doesn't actually send the full chain of certs for its new certificate. 
Given that the server (AFAICT) doesn't staple OCSP responses, I don't 
imagine that practical breakage in a web browser would have been worse 
if the original cert had been revoked immediately, given the CRL 
revocation done last week hasn't appeared in CRLSet/OneCRL either.


~ Gijs

___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy