Neil, this is a delayed reply to your earlier email about my implementation efforts, and whether I am on-board with the Tree Walk. The response has been delayed while I investigated the algorithm.
I began by building a cache of DMARC policies using the PSL and my mail stream as input. The first thing that jumped out was the DNS results of "timeout" (or "Server failed"). After several re-runs, I have the list of recurring problems down to 87 domains (79 from the PSL and 8 from my list). I envision a real performance bottleneck if timeouts occur in high volume due to a concentration of messages on a specific DNS server. To avoid timeouts and obtain other benefits, I concluded that a database-based cache is necessary. I envision using database cache lifetimes that are much longer than DNS TTLs, probably at least a week. This avoids the performance risk associated with short TTLs in DNS. It also provides a framework for configuring overrides, such as to correct a policy that specifies strict alignment when the actual messages from the domain require relaxed alignment. The resulting code requires a dance: (1) check for a result using the cache, (2) Return a success result if all path elements are in the database and those entries have not expired. Otherwise, return a cache failure. (3) On cache failure, walk the tree to reload the cache, then (4) requery the cache to get a final result. (5) repeat for each DKIM domain, as well as the mail from domain and message from domain. (6) Exit when a domain check produces a final result. Some domains, especially TLDs, could be given a long cache life to avoid repeated queries for unchanging data, but then the cache protocol needs to provide feedback on the portion of the domain path that needs to be re-queried, which adds complexity. Overall, this algorithm feels like an order of magnitude more difficult to write and to use than the PSL lookup. When the tree walk cache works, the result will be a database of domains, with indicators of whether the domain has a policy or not, and the policy contents when one exists. Scaling becomes the next concern. Because we are tracking descendent paths from PSL entries, rather than PSL entries themselves, the database size will be multiple orders of magnitude larger than the PSL table. The cache size can be contained by reducing cache life and increasing garbage collection, but the cache reduction reduces filtering performance and garbage collection increases general overhead. In short, I conclude that the Tree Walk has a very large cost for the comparatively small benefit of avoiding specific PSL errors. But fixing PSL errors is still desirable, so another approach is needed. What makes more sense is to collect domain lists from the email logs, possibly supplemented from other sources. Then perform Tree Walks in a background process. The background process would identify domains where the Tree Walk produces a different result than the PSL, so that the difference can be investigated and the local-copy PSL corrected. Perhaps some web service will even do the work and publish results to its clients or the whole web. As a batch algorithm, the Tree Walk has potential. As a real-time algorithm, the Tree Walk algorithm seems like a poor fit. Doug Foster On Fri, Oct 13, 2023 at 3:29 PM Neil Anuskiewicz <n...@marmot-tech.com> wrote: > If I read that right it gives you what you think is a desirable outcome. > That is, this might be a strong sign that you’re at least considering > supporting DMARCbis! > > Yes, we all need to be prepared for headaches no matter which direction > this all goes. > >
_______________________________________________ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc