On 11/09/17 15:30, Rob Stradling via dev-security-policy wrote:
Hi Hanno. Thanks for reporting this to us. We acknowledge the problem,
and as I mentioned at [1], we took steps to address it this morning.
We will follow-up with an incident report ASAP.
INCIDENT REPORT
We received two Problem Reports - from Hanno Böck on 9th September at
20:10 UTC, and from Jonathan Rudenberg on 10th September at 00:08 UTC -
each of which reported that we had misissued a certificate contrary to a
published CAA RRset.
Jonathan reported this problem at
https://bugzilla.mozilla.org/show_bug.cgi?id=1398545, and in
https://bugzilla.mozilla.org/show_bug.cgi?id=1398545#c2 Quirin Scheitle
provided a further misissuance report.
TRIAGING
Some Comodo staff saw these reports late on Friday 9th and began to
discuss them over the weekend, but they were unable to confirm their
accuracy. Indeed, the reports appeared to them to be erroneous, because
the logs at their disposal showed that the relevant CAA checks had been
performed but the RRsets were empty. Therefore, the only action taken
at that point was to escalate the reports to the original developer of
our CAA checking code to look at first thing Monday morning.
BACKGROUND
As you'd expect from the authors of RFC6844, we were an early adopter,
deploying our initial CAA checking implementation 2.5 years ago. It
executes `dig CAA +dnssec +sigchase +trusted-key=dnssec_trusted.keys` to
perform the DNS queries. We chose this approach after concluding that,
at that time, it was the least worst option available to us for checking
DNSSEC signatures. We deployed a specific version of BIND (9.10.1-P2)
because testing had shown that `dig` in the next release of BIND would
crash when trying to do DNSSEC validation.
WHAT WENT WRONG
Our ops team upgraded the servers that our CAA checking code was running
on. This included a very-long-awaited transition from a 32-bit to
64-bit OS. Rather than recompile 9.10.1-P2 for 64-bit, our ops
engineers upgraded BIND to 9.10.5-P1.
Yesterday morning (Monday 11th), when investigating the Problem Reports,
the original developer discovered that as a result of that BIND upgrade
all of our calls to `dig` were returning the following response:
`Invalid option: +sigchase
Usage: dig [@global-server] [domain] [q-type] [q-class] {q-opt}
{global-d-opt} host [@local-server] {local-d-opt}
[ host [@local-server] {local-d-opt} [...]]
Use "dig -h" (or "dig -h | more") for complete list of options`
Unfortunately, this `dig` response was being interpreted by our CAA
checking code as a CAA response that contained: no "issue" property, no
"issuewild" property, no unrecognized critical properties, etc.
This problem had gone undetected due to a combination of reasons: the
developer did not ask for BIND to be upgraded and so did not expect any
behaviour to have changed; the ops engineers did not realize that
upgrading BIND might cause a problem; there wasn't an automated test
that would've detected this problem and raised an alarm; CAA RRsets are
still fairly uncommon, so nobody noticed that we'd dropped from finding
hardly any RRsets to finding zero RRsets; our validation staff only see
the results of our CAA processing rather than the complete output from
`dig`.
ACTION TAKEN TO ADDRESS THE PROBLEM
Upon discovery of the failing `dig` calls, we immediately downgraded to
BIND 9.10.1-P2 and verified that our CAA checks were then working
correctly. We also purged our local cache (of recent `dig` responses)
to ensure that the misissuance vector was completely closed.
PROBLEM CERTIFICATES
The following certificates have all been revoked:
Reported by Hanno:
https://crt.sh/?id=207082245
Reported by Jonathan:
https://crt.sh/?id=207224651
Reported by Quirin:
https://crt.sh/?id=208456003
https://crt.sh/?id=208486480
https://crt.sh/?id=208486489
https://crt.sh/?id=208486485
https://crt.sh/?id=208486495
NEW CAA CHECKING IMPLEMENTATION
Our initial CAA checking implementation, while functional, was not
designed with our current certificate issuance volumes in mind.
Consequently, we had been working on a new, much more scalable CAA
checking implementation, written in Go. We had expected to deploy this
new implementation during Q2 2017, but work on this project was paused
due to the uncertainties of CNAME processing that have now been resolved
at IETF (see https://www.rfc-editor.org/errata/eid5065) and that will
hopefully soon also be resolved at CABForum (see
https://cabforum.org/pipermail/public/2017-August/011972.html).
DEPLOYING OUR NEW CAA CHECKING IMPLEMENTATION
Having fixed our `dig` calls we found that our system was struggling to
process the queue of CAA checks quickly enough, and so we accelerated
our plans to deploy our new CAA checking implementation. This morning
(Tuesday 12th) we verified that our new implementation does a reasonable
job when faced with the test cases at https://caatestsuite.com/, and we
deployed it.
VERIFYING OUR NEW CAA CHECKING IMPLEMENTATION
We are taking immediate steps to engage the services of an external
security consultant to independently assess our new CAA checking
implementation and to work with us to ensure that it behaves correctly.
ACKNOWLEDGMENTS
We would like to express our thanks to Hanno, Jonathan and Quirin for
reporting the problem to us, and to Andrew Ayer for providing
https://caatestsuite.com/.
--
Rob Stradling
Senior Research & Development Scientist
COMODO - Creating Trust Online
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy