On Mon, Mar 23, 2020 at 03:01:34PM +0000, Jeremy Rowley wrote:
> Ryan's post was the part I thought was relevant, but I understood it
> differently.  The cert was issued, but we should have now revoked it (24
> hours after receiving notice).  I do see your interpretation though, and
> the language does support 24 hours after issuing the new cert.

Aha, righto.  Glad we've gotten on the same page there.

> What I need is a tool that scans after revocation to ensure there are no
> additional certs with the same key.

I can give you a certwatch SQL query that'll do that, if you like -- "show
me all certs with this SPKI (or set of SPKIs) which aren't expired or
OCSP-revoked".  I use pretty much that query to get periodic reports of new
certificates that have appeared with keys already in the dungeon.  It's not
ideal for your purposes, though -- you might get some false positives
because your OCSP responders aren't up-to-date, and false negatives are
possible if certwatch is backlogged (or a cert wasn't logged).

Beyond that, though, if your internal certificate archive isn't indexed on
SPKI fingerprint, and updated in near-real-time, those are problems that I
think you'll have to fix, because the Internet's propensity to post their
private keys on the Internet, and then reuse them to get new certs after the
old one got revoked for key compromise, does not seem to be one that is
going away any time soon.

> The frustration is that this was
> where the cert was issued after our scan of all keys but just before
> revocation.  As a side note, our system blacklists the keys when a cert is
> revoked for key compromise, which means I don't have a way to blacklist a
> key before a cert is ever issued.

To the software developers!  *blows trumpets*

> >> I don't think that supports your point, though, so I wonder if I've got
> >> the wrong part.  That last part of Ryan's: "shenanigans, such as CAs
> >> arguing the[y?] require per-cert evidence rather than systemic
> >> demonstrations", seems to me like it's describing your statement,
> >> above, that you (only?) "need to revoke a cert that is key compromised
> >> once we're the key is compromised *for that cert*" (emphasis added).  I
> >> don't read Ryan's use of "shenanigans" as approving of that sort of
> >> thing.
> 
> I don't think its shenanigans, but I do think it's a pretty common
> interpretation.  Such information would help determine the common
> interpretation of this section.  I agree that CAs should scan all certs
> for the same key and revoke where they find them.  Is that actually
> happening?

A lot to unpack here, let me make up some specific questions and answer them
as best I can.

"Have any other CAs failed to revoke certificates issued for keys for
which they had previously revoked a certificate for key compromise?"

Yes, one CA has failed to do so, and I've reported that to this list.

"Have any other CAs successfully revoked a certificate within the
BR-mandated (am I OK using that phrase now?) time period, that they issued
for a private key for which they had previously revoked a certificate due to
key compromise?"

I don't know, I haven't checked (yet).  It's on my (lengthy) list of Interesting
Questions For Which I Need To Write Insanely Complicated SQL Queries.

"Have any other CAs blacklisted a private key that was reported as
compromised and prevented issuance before they had issued a certificate for
that key?"

Naturally I can't answer that one directly, because I don't have internal
access to CA systems.  *But*, I have one test case, in which a private key
known to have "hopped" between CAs after revocation was reported to a number
of CAs proactively, but there hasn't been a resolution to that test case one
way or the other.  No new certs have been issued for the same name, with the
compromised key or any other, so it's impossible to infer what the outcome
may be.

Other very specific questions welcomed.  The answers will probably be "I
haven't looked yet", but there are a lot of questions I've got at least a
vague idea of how to answer, once I have time2SQL.

> Do other CAs object to there being a lack of specificity if you give the
> keys without a cert attached?

Since I have been sending (links to) CSR format compromise attestations, no
CA has communicated an objection to the format of my reports, nor have they
failed to act (in some fashion) to any of my reports, as far as I am aware.

> >> Bim bam bom, all done and dusted, and we can get back to washing our 
> >> hands. 
>
> That you're *not* doing that is perplexing, and a little disconcerting.

My wife is immuno-suppressed -- about the only time I'm not washing my
hands is when I'm typing at my keyboard (the suds get in the switches and
cause all sorts of problems).  If I could get a reliable supply of
sanitizer, I'd be swimming in the stuff.

> That's an oversimplification of the incident report process.  I'm not
> resisting the incident report itself since incident reports are a good and
> healthy way for the public to have transparency into a CAs operation.  In
> fact, I wouldn't mind filing one just to give public documentation on what
> DigiCert is doing for revocation.  I was objecting to actually calling
> this an incident/breach of the guidelines since I disagreed with that, and
> there's a trend where the interpretation of these sections seems to evolve
> over time.  My main emphasis was that the guidelines are ambiguous and
> need refinement.  I'd support refining them to be more clear, but calling
> something shenanigans when the language is unclear is unfair.

I can see your point there -- there's no denying that the BR language can be
misinterpreted.  Given a suitably motivated reader, of course, *anything*
can be creatively misinterpreted, but that use of "obtains evidence" is
undoubtedly tricky.

>From what Ryan said in the previous thread, my impression is that attempts
were made to improve the language, but were blocked.  Unfortunately, in that
sense, you're beholden to the most reactionary majority of your fellow CAs
in improving the clarity of the language in the BRs.

I suppose these sorts of enduring ambiguity are part of the reason why all
CAs are expected to follow this list -- you could think of it as the "judiciary
branch", interpreting the "laws" and providing precedent.

> For the SPKI, you're right.  The slow part is downloading it from your
> website.  I guess I should have said "a link to the key compromise proof"
> rather than the SPKI.

Just for clarity, was it the HTTP requests themselves that were slow (which
is something I want to know about, and fix) or was it that your front-line
people aren't tooled up for ad-hoc automation?  (To be clear, I'm not
criticising you for having front-line people who don't know shell scripting,
just trying to clarify the problem)

> 3) Have the tool process the revocation at the 24 hour mark. 

You'll want to factor in the delays in getting the message to your OCSP
responders and CRLs.  I don't think that arguing that "publishing" doesn't
need to include actually making the revocation public would get very far.

> >>  Even if Digicert would just prefer that I did something differently in
> >> my problem reports in the future -- something that would make it
> >> legitimately easier for Digicert to process my reports, without making
> >> my life a misery (after all, y'all have a lot more resources than I do)
> >> -- all you have to do is ask.  My e-mail address is right there in
> >> every problem report I send.
> 
> Nah - although I'd prefer just to get the CSRs in a zip file,  I can work
> with what you send us.

Well, there shouldn't be any more mass key compromise notifications coming
through, at least not of the scale of that last round, so "CSRs in a zip"
shouldn't be necessary.

I'm intending to get to the point in the near future where the notifications
go out one by one, as soon as the key is discovered to be compromised.

> The reason I brought up the format is that I'm not sure when the 24 hour
> clock actually kicks off for revocation.  Is it 24 hours after we get your
> email or 24 hours after we can confirm key compromise?  I've always
> regarded these as a certificate problem report under 4.9.5 (requiring us
> to kick off the investigation) but then revocation should happen 24 hours
> after we have reason to suspect the key compromise is legit.

It's surprising that you say that, because to me, 4.9.5 isn't nearly as
ambiguous as the relevant point in 4.9.1.1.  4.9.5 says "The period from
receipt of the Certificate Problem Report or revocation-related notice to
published revocation".  While "receipt" could, I suppose, still be argued as
meaning "when we checked the mailbox", I don't think it's reasonable to
claim that "receipt" means "when we validated the CSR" -- especially since
it's "receipt of the Certificate Problem Report", not "receipt of concrete
evidence" or anything like that.

At any rate, for the specific issue we're discussing -- failure to revoke a
cert for a previously-reported compromised private key -- I don't think
4.9.5 really matters.  4.9.1.1 says the CA has to revoke within 24 hours of
"obtaining evidence of key compromise".  For a certificate issued after a
report of key compromise has been sent to the CA, the moment at which it can
be said that the CA has "obtained evidence of key compromise" *for that
specific certificate*, depending on your interpretation, could be:

* the time at which the problem report which initially reported the compromised 
key
  landed on the CA's mail server (which is the last moment that an external
  observer can know what's going on);

* the time at which that the CA validates that the evidence is valid;

* the time at which the new certificate using the compromised private key is
  issued; or

* when someone says "hey, you issued another cert for that key I mentioned
  before!"

(If you think there's another one, please add it.  Again, remember, this is
*only* for the case of a certificate issued after an initial key compromise
report is received by the CA.)

The last of those moments is what it looks (to me) like you were relying on,
but which Ryan's statement seems to describe as "shenanigans".  It implies
to me that CAs have no memory of the past when it comes to key compromises,
which isn't a situation that is reasonable, to my mind.  The same "obtains
evidence" phrasing is used in the next point (regarding domain control
validation that cannot be relied upon), and the idea that a CA would argue
that they could legitimately issue a certificate for a name which they had
previously accepted had not been properly validated is, I hope you'd agree,
disconcerting in the extreme.

> To be clear, I'm not complaining about the format.  I'm wondering when we
> obtained the private key for the 24 hour purposes.  With automation, the
> time between when we get the email and when we confirm key compromise
> should be nearly zero.  However, with a more manual process, that time is
> not insignificant.  What I don't like about the interpretation that  the
> revocation event is 24 hours from when we get an email is some emails are
> very vague about key compromise.  With that reading, if we get an email
> without proof that is later followed up by proof, the 24 hour period could
> start when we get the initial email even if the proof is provided 25 hours
> later.

Well, if that were the case, then within 24 hours of the receipt of the
problem report you should have provided a preliminary report which says
"evidence is insufficient", and thus that problem report could reasonably be
considered to have been handled.  The "followup" e-mail that did contain
useful evidence would reasonably be considered a "new" problem report. 

Even when the useful evidence is provided after the initial e-mail, but
before you put out the preliminary compromise report, if the initial e-mail
didn't contain anything actionable I doubt that a root store is going to
class it as an incident if you don't revoke within 24 hours of the initial
e-mail -- as long as you got it done within 24 hours of when the actual,
useful evidence landed on your MTA.

Certainly, if someone came here and tried to drop a CA in it with the above
set of facts, I'd be in the front row of people saying "pull the other one,
it plays jingle bells".

> That does happen, which is why I think the time period should be
> 24 hours from after the CA receives proof of key compromise.  But even
> that is ambiguous.  When did we receive proof of key compromise?  I'd say
> it's when all the CSRs finished downloading.

You could have started processing revocations for the first CSRs downloaded
-- waiting for the last CSR to download before starting to process the first
is unlikely to be a valid argument for delayed revocation.

It would almost certainly be worse for you, though, if it *was* a valid
argument, because the solution to that, from my perspective, would be to
send 800-odd separate certificate problem reports, each with one key.  I
can't imagine a situation in which that pile of e-mails would be *easier* to
process than one big list of compromised keys, but it would prevent a CA
from arguing that it look a long time to download all the CSRs.

Claiming that you didn't have enough staff to process that volume of key
compromise reports would be unlikely to be accepted, because if "lack of
staff" were seen as a valid excuse it would encourage CAs to understaff
their problem report processing, which would be Extremely Terribad.

> If that's not the case, then you are encouraging CAs to be myopic in the
> way they accept key compromise information.

Yes, it is possible that being strict on revocation deadlines might cause
some CAs to react by trying to reject valid-but-tricky certificate problem
reports.  There's all *sorts* of reasons why CAs might react in ways that
the root stores (and the surrounding community) would find problematic.
At the end of the day, the ultimate arbiters of the reasonableness of those
behaviours are the root stores.

As I have already done once before, if a CA rejected one of my problem
reports for what I felt were spurious reasons, I'd check with Mozilla and
the community, via this list, to see if my expectations were unreasonable. 
If the response was "your problem report was reasonable", then I'd lay out
the full details of the case, and the CA gets to explain their behaviour,
and root stores get to act as they see fit.  If, on the other hand, the
response is "stop being a goose, Matt" then I don't make a report, and I do
something else in the future.

- Matt

_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to