Neil, this is a delayed reply to your earlier email about my implementation
efforts, and whether I am on-board with the Tree Walk.   The response has
been delayed while I investigated the algorithm.

I began by building a cache of DMARC policies using the PSL and my mail
stream as input.

The first thing that jumped out was the DNS results of "timeout" (or
"Server failed").   After several re-runs, I have the list of recurring
problems down to 87 domains (79 from the PSL and 8 from my list).  I
envision a real performance bottleneck if timeouts occur in high volume due
to a concentration of messages on a specific DNS server.

To avoid timeouts and obtain other benefits, I concluded that a
database-based cache is necessary.   I envision using database cache
lifetimes that are much longer than DNS TTLs, probably at least a week.
 This avoids the performance risk associated with short TTLs in DNS.   It
also provides a framework for configuring overrides, such as to correct a
policy that specifies strict alignment when the actual messages from the
domain require relaxed alignment.

The resulting code requires a dance:
(1) check for a result using the cache,
(2)  Return a success result if all path elements are in the database and
those entries have not expired.  Otherwise, return a cache failure.
(3) On cache failure, walk the tree to reload the cache, then
(4) requery the cache to get a final result.
(5) repeat for each DKIM domain, as well as the mail from domain and
message from domain.
(6) Exit when a domain check produces a final result.

Some domains, especially TLDs, could be given a long cache life to avoid
repeated queries for unchanging data, but then the cache protocol needs to
provide feedback on the portion of the domain path that needs to be
re-queried, which adds complexity.  Overall, this algorithm feels like an
order of magnitude more difficult to write and to use than the PSL lookup.

When the tree walk cache works, the result will be a database of domains,
with indicators of whether the domain has a policy or not, and the policy
contents when one exists.   Scaling becomes the next concern.   Because we
are tracking descendent paths from PSL entries, rather than PSL entries
themselves, the database size will be multiple orders of magnitude larger
than the PSL table.   The cache size can be contained by reducing cache
life and increasing garbage collection, but the cache reduction reduces
filtering performance and garbage collection increases general overhead.

In short, I conclude that the Tree Walk has a very large cost for the
comparatively small benefit of avoiding specific PSL errors.   But fixing
PSL errors is still desirable, so another approach is needed.

What makes more sense is to collect domain lists from the email logs,
possibly supplemented from other sources.  Then perform Tree Walks in a
background process.  The background process would identify domains where
the Tree Walk produces a different result than the PSL, so that the
difference can be investigated and the local-copy PSL corrected.    Perhaps
some web service will even do the work and publish results to its clients
or the whole web.

As a batch algorithm, the Tree Walk has potential.   As a real-time
algorithm, the Tree Walk algorithm seems like a poor fit.

Doug Foster


On Fri, Oct 13, 2023 at 3:29 PM Neil Anuskiewicz <n...@marmot-tech.com>
wrote:

> If I read that right it gives you what you think is a desirable outcome.
> That is, this might be a strong sign that you’re at least considering
> supporting DMARCbis!
>
> Yes, we all need to be prepared for headaches no matter which direction
> this all goes.
>
>
_______________________________________________
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc

Reply via email to