Sure, happy to provide more details! The fundamental issue here is the
scale at which Let's Encrypt issues, and the automated nature by which
clients interact with Let's Encrypt.

LE currently has 150M certificates active, all (as of March 1st) signed by
the same issuer certificate, R3. In the event of a mass revocation, that
means a CRL with 150M entries in it. At an average of 38 bytes per entry in
a CRL, that means nearly 6GB worth of CRL. Passing around a single 6GB file
isn't good for reliability (it's much better to fail-and-retry downloading
one of a hundred 60MB files than fail-and-retry a single 6GB file), so
sharding seems like an operational necessity.

Even without a LE-initiated mass revocation event, one of our large
integrators (such as a hosting provider with millions of domains) could
decide for any reason to revoke every single certificate we have issued to
them. We need to be resilient to these kinds of events.

Once we've decided that sharding is necessary, the next question is "static
or dynamic sharding?". It's easy to imagine a world in which we usually
have only one or two CRL shards, but dynamically scale that number up to
keep individual CRL sizes small if/when revocation rises sharply. There are
a lot of "interesting" (read: difficult) engineering problems here, and
we've decided not to go the dynamic route, but even if we did it would
obviously require being able to change the list of URLs in the JSON array
on the fly.

For static sharding, we would need to constantly maintain a large set of
small CRLs, such that even in the worst case no individual CRL would become
too large. I see two main approaches: maintaining a fully static set of
shards into which our certificates are bucketed, or maintaining rolling
time-based shards (much like CT shards).

Maintaining a static set of shards has the primary advantage of "working
like CRLs usually work". A given CRL has a scope (e.g. "all certs issued by
R3 whose serial number is equal to 1 mod 500"), it has a nextUpdate, and a
new CRL with the same scope will be re-issued at the same path before that
nextUpdate is reached. However, it makes re-sharding difficult. If Let's
Encrypt's issuance rises enough that we want to have 1000 shards instead of
500, we'll have to re-shard every cert, re-issue every CRL, and update the
list of URLs in the JSON. And if we're updating the list, we should have
standards around how that list is updated and how its history is stored,
and then we'd prefer that those standards allow for rapid updates.

The alternative is to have rolling time-based shards. In this case, every X
hours we would create a new CRL, and every certificate we issue over the
next period would belong to that CRL. Similar to the above, these CRLs have
nice scopes: "all certs issued by R3 between AA:BB and XX:YY"). When every
certificate in one of these time-based shards has expired, we can simply
stop re-issuing it. This has the advantage of solving the resharding
problem: if we want to make our CRLs smaller, we just increase the
frequency at which we initialize a new one, and 90 days later we've fully
switched over to the new size. It has the disadvantage from your
perspective of requiring us to add a new URL to the JSON array every period
(and we get to drop an old URL from the array every period as well).

So why would we want to put each CRL re-issuance at a new path, and update
our JSON even more frequently? Because we have reason to believe that
various root programs will soon seek CRL re-issuance on the order of every
6 hours, not every 7 days as currently required; we will have many shards;
and overwriting files is a dangerous operation prone to many forms of
failure. Our current plan is to surface CRLs at paths like
`/crls/:issuerID/:shardID/:thisUpdate.der`, so that we never have to
overwrite a file. Similarly, our JSON document can always be written to a
new file, and the path in CCADB can point to a simple handler which always
serves the most recent file. Additionally, this means that anyone in
possession of one of our JSON documents can fetch all the CRLs listed in it
and get a *consistent* view of our revocation information as of that time.

I believe that there is an argument to be made here that this plan
increases the auditability of the CRLs, rather than decreases it. Root
programs could require that any published JSON document be valid for a
certain period of time, and that all CRLs within that document remain
available for that period as well. Or even that historical versions of CRLs
remain available until every certificate they cover has expired (which is
what we intend to do anyway). Researchers can crawl our history of CRLs and
examine revocation events in more detail than previously available.

Regardless, even without statically-pathed, timestamped CRLs, I believe
that the merits of rolling time-based shards are sufficient to be a strong
argument in favor of dynamic JSON documents.

I hope this helps and that I addressed your questions,
Aaron

On Thu, Feb 25, 2021 at 9:53 AM Ryan Sleevi <r...@sleevi.com> wrote:

>
>
> On Thu, Feb 25, 2021 at 12:33 PM Aaron Gable via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
>
>> Obviously this plan may have changed due to other off-list conversations,
>> but I would like to express a strong preference for the original plan. At
>> the scale at which Let's Encrypt issues, it is likely that our JSON array
>> will contain on the order of 1000 CRL URLs, and will add a new one (and
>> age
>> out an entirely-expired one) every 6 hours or so. I am not aware of any
>> existing automation which updates CCADB at that frequency.
>>
>> Further, from a resiliency perspective, we would prefer that the CRLs we
>> generate live at fully static paths. Rather than overwriting CRLs with new
>> versions when they are re-issued prior to their nextUpdate time, we would
>> leave the old (soon-to-be-expired) CRL in place, offer its replacement at
>> an adjacent path, and update the JSON to point at the replacement. This
>> process would have us updating the JSON array on the order of minutes, not
>> hours.
>
>
> This seems like a very inefficient design choice, and runs contrary to how
> CRLs are deployed by, well, literally anyone using CRLs as specified, since
> the URL is fixed within the issued certificate.
>
> Could you share more about the design of why? Both for the choice to use
> sharded CRLs (since that is the essence of the first concern), and the
> motivation to use fixed URLs.
>
> We believe that earlier "URL to a JSON array..." approach makes room for
>> significantly simpler automation on the behalf of CAs without significant
>> loss of auditability. I believe it may be helpful for the CCADB field
>> description (or any upcoming portion of the MRSP which references it) to
>> include specific requirements around the cache lifetime of the JSON
>> document and the CRLs referenced within it.
>
>
> Indirectly, you’ve highlighted exactly why the approach you propose loses
> auditability. Using the URL-based approach puts the onus on the consumer to
> try and detect and record changes, introduces greater operational risks
> that evade detection (e.g. stale caches on the CAs side for the content of
> that URL), and encourages or enables designs that put greater burden on
> consumers.
>
> I don’t think this is suggested because of malice, but I do think it makes
> it significantly easier for malice to go undetected, for accurate historic
> information to be hidden or made too complex to maintain.
>
> This is already a known and, as of recent, studied problem with CRLs [1].
> Unquestionably, you are right for highlighting and emphasizing that this
> constrains and limits how CAs perform certain operations. You highlight it
> as a potential bug, but I’d personally been thinking about it as a
> potential feature. To figure out the disconnect, I’m hoping you could
> further expand on the “why” of the design factors for your proposed design.
>
> Additionally, it’d be useful to understand how you would suggest CCADB
> consumers maintain an accurate, CA attested log of changes. Understanding
> such changes is an essential part of root program maintenance, and it does
> seem reasonable to expect CAs to need to adjust to provide that, rather
> than give up on the goal.
>
> [1]
> https://arxiv.org/abs/2102.04288
>
>>
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to