Re: [Acme] Practical concerns of draft-ietf-acme-ari

Aaron Gable Fri, 21 Jul 2023 15:14:25 -0700

On Fri, Jul 21, 2023 at 2:00 PM Matthew Holt <m...@dyanim.com> wrote:


> I simply do not think there is a way to offer a wider renewal window than
> the full lifetime of the certificate by offering a narrower renewal window.
> I know that sounds silly, but since "backoff and retry" is the One Way to
> reliably getting a certificate in case of a problem, the more time means
> the more chances for success.
>

I'm still very confused by this claim. Yes, it is true that anything that
causes a client to use less than the full lifetime of the certificate to
attempt renewal technically decreases the probability of success. But all
clients do that anyway: the vast majority don't even bother attempting
renewal until 2/3rds of the lifetime of the certificate has already passed.
Many clients only wake up once a day at midnight UTC, flooding the ACME
server along with every other similarly-configured client, drastically
reducing their chances at a successful renewal. ARI does not make that
situation worse.


> And I'll reiterate again: if we know a certificate is going to be revoked,
> we might as well stop trusting it.
>

Again, I don't disagree -- there's no meaningful difference between
"revoked" and "definitely about to be revoked" from a trust perspective.
But the point I'm trying to drive home is that *ARI information cannot be
used to infer trustworthiness*. The fact that a few people keep insisting
that it can is causing me to think that maybe we should drop the
`explanationURL` and the unauthenticated-GET version of the endpoint, to
remove any doubt.


> Ok, but *in practice* it is true.
>

No, it isn't. As I said, Let's Encrypt currently intends to engage in both
randomly jittering suggested windows, and moving them to smooth load
spikes. Note that "smooth load spikes" doesn't mean "oh no, if this spike
happens, issuance will fail", it simply means "this time of day often has
more load, let's see if we can shift it".


> Is the concern with appending '/ari' that it will result in an invalid url
> (e.g. ...?a=b/ari)? The URL could be parsed and reconstructed. (*caveats
> noted: URL parsing is hard, but most languages do it anyway).
>

The format of the `certificate` URL in finalized orders is completely
unspecified. It is inappropriate to attempt to force structure upon it,
either by concatenating anything with it, or parsing, modifying, and
re-serializing it.


> Or is this infeasible because we need to make the assumption that the cert
> is uniquely identified by the path alone and not a query string? I guess in
> my mind this seems like the more sensible blocker as I'm not sure what to
> do about that.
>
> That said, I too like the idea of using the cert URL. So instead of
> manipulating the certificate URL, what if the renewalInfo URL given in the
> directory is "the one" for ARI, and the cert URL was specified as a
> URL-encoded query param? (When I first started reading the ARI spec, that's
> more what I was expecting.) I propose the QS because even though we could
> put the cert URL as part of the path in an escaped form, some web servers
> may not parse this properly (e.g. %2F is troublesome in particular because
> URL normalization is often a necessary security measure and this can be
> indistinguishable from / after normalizing, unless done carefully; see
> https://github.com/caddyserver/caddy/pull/4948).
>

Using the cert URL as the unique identifier requires the client to persist
additional information (either the certificate URL, or the Order URL it can
be looked up from) beyond the PEM-encoded certificate chain itself. If
we're asking clients to persist additional information, we might as well
ask them to persist the ARI URL directly. Regardless, I don't think that
using the certificate URL is any better of a unique identifier than the
AuthorityKeyId+Serial, or IssuerNameHash+Serial, as has been proposed
elsewhere in this thread.


> Well, for one, this approach can greatly reduce complexity in clients,
> since it uses existing renewal flows. No need to go out-of-band to schedule
> renewals. It's basically just part of the existing timers/loops/schedulers.
> You simply adjust your timer/sleep/whatever, instead of needing to start
> all new ones and synchronize ARI routines with renewal routines.
>

I'm sorry, I'm still very confused by this proposal.

If it's "just part of the existing timers/loops/schedulers", then it sounds
like you imagine this newOrder request only being made when the client
would have made one anyway. In that case, you're missing out on all of the
benefit of ARI (early notification of renewal window changes), since you're
presumably not making any requests for the first 2/3rds of the lifetime of
the certificate.

If these newOrder requests *are* being made more frequently than today,
then how is ARI "out-of-band"? The client wakes up, makes an ARI request,
and decides whether or not to proceed with issuance. That's exactly the
same as your proposal, where the client wakes up, makes a newOrder request,
and decides whether it has been allowed to proceed with issuance.

Yes, the current draft is technically harder to implement, because it
requires teaching the client how to make one new kind of API call. But the
core logic flow is no different from clients that check OCSP every time
they wake up, to see if they need to renew due to a revocation.

Regardless, I'm pretty unwilling to go down this path for two reasons:
1) It changes "newOrder" into "maybeNewOrder", effectively. The semantics
of the request change entirely, depending on the presence or absence of an
opaque field.
2) It hijacks the Retry-After header of the response, which the ACME server
may already be using for other purposes (e.g. ratelimiting). Since an ARI
"we suggest you try in two weeks" and a rate-limit "you cannot try again
for two hours" are very different, combining those semantics is a
non-starter.


>
> As for solving issues, although I did say "OCSP CertID," it actually could
> be whatever improved/easier identifier we end up using. The point is that
> it's a way to identify the certificate and distinguish this request as
> "ARI-enabled" so to speak. As for the number of requests: it's true this
> doesn't reduce them, but as you say, there's not really much hope in doing
> that anyway due to the 24 hour constraint. There are still advantages to
> this approach that make it appealing: simpler clients, simpler servers
> (less infrastructure), less complexity all around. (I'm glad you at least
> like the new field in the newOrder requests. I agree that it could be
> beneficial regardless.)
>
> On the one hand, I'm in complete agreement, it would be great to have a
>> "batch" endpoint that returns suggested windows for all certificates
>> associated with a given account, or matching some other criteria. On the
>> other hand, there's a reason that Let's Encrypt diverges from RFC8555 and
>> does not implement the "orders" field on account objects: endpoints which
>> serve unboundedly-large documents and require paging are difficult to
>> implement correctly on both the server and client side, and can quickly
>> lead to disruptive database queries.
>>
>
> Ok, this is interesting. I totally understand the difficulty with paging.
> (I wonder if populating the "orders" field could be done optionally, like
> if the client specifically requests that it be populated somehow. Then the
> server load is still greatly reduced and provides useful info. But I
> digress.)
>
> I should clarify though that what I'm suggesting with (B) does *not*
> involve enumerating many results and paging through a DB. Maybe in the
> worst case where there is no simple way to describe the affected
> certificates and they just need to be listed by ID/URL. But I believe what
> we've seen in the past suggests that most certificates could be expressed
> in terms of a date range, account fingerprint, etc -- some simple notation
> that allows the *client* to compute whether it is relevant to them. This
> endpoint could even be a static resource.
>

I'm now even more confused about this. You're proposing that the ACME
server expose some sort of "certificates which meet these criteria need to
be renewed soon" endpoint? I have two concerns with this:
1) This feels like it's stepping even further into the trap of "if it's
listed by ARI, it's going to be revoked soon". I very strongly want to
ensure that that logic does not and cannot be used. So moving in this
direction feels like a step backwards, not forwards.
2) If 1% of Let's Encrypt issuance was affected by an incident at random
(e.g. because one instance of a Boulder service was misbehaving), then such
an endpoint would have to list 2.5 million certificate unique identifiers.
I think that specifying an endpoint with such unbounded growth would be
hugely detrimental to implementation and adoption.

I think there are reasonable arguments to be made for a batch endpoint that
looks like either
a) ARI info for all certs issued to a single subscriber (this would be an
authenticated POST-as-GET); or
b) ARI info for all certs identified in the request (e.g. allowing up to
100 certificate unique identifiers to be supplied in a single request).
I don't love either idea, but they're straightforward and would reduce
request volume.

Unfortunately, I will not be able to be present at the IETF 117 meeting due
to pre-planned vacation, but I look forward to reading the notes on the
discussion afterwards.

Aaron

_______________________________________________
Acme mailing list
Acme@ietf.org
https://www.ietf.org/mailman/listinfo/acme

Re: [Acme] Practical concerns of draft-ietf-acme-ari

Reply via email to