On Fri, Feb 26, 2021 at 1:46 PM Aaron Gable <aa...@letsencrypt.org> wrote:

> If we leave out the "new url for each re-issuance of a given CRL" portion
> of the design (or offer both url-per-thisUpdate and
> static-url-always-pointing-at-the-latest), then we could in fact include
> CRLDP urls in the certificates using the rolling time-based shards model.
> And frankly we may want to do that in the near future: maintaining both CRL
> *and* OCSP infrastructure when the BRs require only one or the other is an
> unnecessary expense, and turning down our OCSP infrastructure would
> constitute a significant savings, both in tangible bills and in engineering
> effort.
>

This isn’t quite correct. You MUST support OCSP for EE certs. It is only
optional for intermediates. So you can’t really contemplate turning down
the OCSP side, and that’s intentional, because clients use OCSP, rather
than CRLs, as the fallback mechanism for when the aggregated-CRLs fail.

I think it would be several years off before we could practically talk
about removing the OCSP requirement, once much more reliable CRL profiles
are in place, which by necessity would also mean profiling the acceptable
sharding algorithms.

Further, under today’s model, while you COULD place the CRLDP within the
certificate, that seems like it would only introduce additional cost and
limitation without providing you benefit. This is because major clients
won’t fetch the CRLDP for EE certs (especially if OCSP is present, which
the BRs MUST/REQUIRE). You would end up with some clients querying (such as
Java, IIRC), so you’d be paying for bandwidth, especially in your mass
revocation scenario, that would largely be unnecessary compared to the
status quo.

Thus, in my mind, the dynamic sharding idea you outlined has two major
> downsides:
> 1) It requires us to maintain our parallel OCSP infrastructure
> indefinitely, and
>

To the above, I think this should be treated as a foregone conclusion in
today’s requirements. So I think mostly the discussion here focuses on #2,
which is really useful.

2) It is much less resilient in the face of a mass revocation event.
>
> Fundamentally, we need our infrastructure to be able to handle the
> revocation of 200M certificates in 24 hours without any difference from how
> it handles the revocation of one certificate in the same period. Already
> having certificates pre-allocated into CRL shards means that we can
> deterministically sign many CRLs in parallel.
>

You can still do parallel signing. I was trying to account for that
explicitly with the notion of the “pre-reserved” set of URLs. However, that
also makes an assumption I should have been more explicit about: whether
the expectation is “you declare, then fill, CRLs”, or whether it’s
acceptable to “fill, then declare, CRLs”. I was trying to cover the former,
but I don’t think there is any innate prohibition on the latter, and it was
what I was trying to call out in the previous mail.

I do take your point about deterministically, because the process I’m
describing is implicitly assuming you have a work queue (e.g. pub/sub, go
channel, etc), in which certs to revoke go in, and one or more CRL signers
consume the queue and produce CRLs. The order of that consumption would be
non-deterministic, but it very much would be parallelizable, and you’d be
in full control over what the work unit chunks were sized at.

>
> Dynamically assigning certificates to CRLs as they are revoked requires
> taking a lock to determine if a new CRL needs to be created or not, and
> then atomically creating a new one. Or it requires a separate,
> not-operation-as-normal process to allocate a bunch of new CRLs, assign
> certs to them, and then sign those in parallel. Neither of these --
> dramatically changing not just the quantity but the *quality* of the
> database access, nor introducing additional processes -- is acceptable in
> the face of a mass revocation event.
>

Right, neither of these are required if you can “produce, then declare”.
From the client perspective, a consuming party cannot observe any
meaningful difference from the “declare, then produce” or the “produce,
then declare”, since in both cases, they have to wait for the CRL to be
published on the server before they can consume. The fact that they know
the URL, but the content is stale/not yet updated (I.e. the declare then
produce scenario) doesn’t provide any advantages. Ostensibly, the “produce,
then declare” gives greater advantage to the client/root program, because
then they can say “All URLs must be correct at time of declaration” and use
that to be able to quantify whether or not the CA met their timeline
obligations for the mass revocation event.

In any case, I think this conversation has served the majority of its
> purpose. This discussion has led to several ideas that would allow us to
> update our JSON document only when we create new shards (which will still
> likely be every 6 to 24 hours), as opposed to on every re-issuance of a
> shard. We'd still greatly prefer that CCADB be willing to
> accept-and-dereference a URL to a JSON document, as it would allow our
> systems to have fewer dependencies and fewer failure modes, but understand
> that our arguments may not be persuasive enough :)
>

<Google-Hat>We’re just one potential consumer, and not even the most urgent
of potential consumers (I.e. we would not be immediately taking advantage
of this as others may). You’ve raised a lot of good points, and also
highlighted a good opportunity to better communicate some of our
assumptions in the design - e.g. the ability for CAs to programmatically
update such contents being an essential property - that have been discussed
and are on the MVP implementation plan, but not communicated as such. We
definitely want to make sure ALL CCADB members are comfortable.</Google-Hat>

The tension here is the tradeoffs/risks to Root Programs (which,
admittedly, are not always obvious or well communicated) with the potential
challenges for CAs. I personally am definitely not trying to make CAs do
all the work, but I’m sensitive to the fact that a system that requires 70
CAs to do 1 thing feels like it scales better than requiring N root
Programs to do 70 things :)

If Mozilla et al. do go forward with this proposal as-is, I'd like to
> specifically request that CCADB surfaces an API to update this field before
> any root programs require that it be populated, and does so with sufficient
> lead time for development against the API to occur.
>

Agreed - I do think having a well-tested, reliable path for programmatic
update is an essential property to mandating the population. My hope and
belief, however, is that this is fairly light-weight and doable.

The primary benefits I see to this approach is it moves us from poll (where
the onus is on separate CCADB members/consumers) to push (where the
responsibility is on the CA to notify), and that it gives an auditable
historic archive “for free”, without requiring yet another bespoke archival
tool be created. Both of these enable greater CA accountability, which is
why I feel they’re important, but they definitely should be balanced
against the tradeoffs that CAs may have to undergo.

>
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to