OK, so I've dived into Yahoo's incoming metadata to look at what fails DMARC and why. Conclusion 1: I cannot automatically tell the cases apart with any accuracy. Hand coding them is so time-consuming as to be beyond my ability to do at scale. So, not many numbers, but I have developed some very educated opinions, which unfortunately take a small novel to explain. First, transactional domains tend to have some mail that fails DMARC because of forwarding. The highest estimate of that I've seen is about 1.5% (that's not data I ran); on the domains I've looked at so far I saw rates well below 0.1%. Still, that's people not getting their mail. For transactional mail, this is a strong majority forwarding (as in I send mail to a...@b.com and it delivers to a...@c.com), and it's dominated by educational institutions forwarding to their students/alumni, followed by hosting services and ISPs. The reasons for the forwarding breaking the signatures clearly vary; there are popular mail systems that, for instance, re-mime-encode bodies, or add signature files of various sorts, ranging from notifications that the message has been virus scanned to ads for the ISP forwarding the mail. Re-encoding things can result in intermittent breakage, where some forwarding works (because the forwarder doesn't feel the need to fix it) and other messages don't. A minority is not straightforward, my old address routes to my new one, forwarding, but are services that are specialized to handling mail -- third party spam filtering or mail classification products that then redeliver mail. They don't make up a big percentage of the cases but they are notable as cases that presumably would be able to change their handling if we provided them with options. Then we get to end-user domains. Although I've heard from corporate domains where DMARC breakage is actually lower for the humans than the transactional mail, for the big mailbox holders, as far as I can tell, the expected ratios hold; more mail fails for end-users than for transactional mail. How much more? That gets interesting. For the domains at p=reject, the mail that arrives with an aligned DKIM signature still on it, but not working -- which is the common case for both forwarding and mailing lists -- is significantly under 1%. For the domains at p=none, that comes closer to 2%. That difference is partly in mailing lists, but, sadly, a noticeable amount of the difference between p=reject and p=none domains when it comes to the mailing list mail is spam from a few, mostly commercial, mailing list providers. It is clear that a number of valid, happy mailing lists you might like have chosen to move subscribers from p=reject to p=none providers, with mixed success; the high volume ones I've checked still have p=reject posters, at low rates. It is also clear that a lot of educational institutions *both* run popular, DKIM-breaking forwarders *and* run popular, DKIM-breaking mailing lists (and, unsurprisingly, have alumni who go through both at once), which is one way that these things rapidly become intractable for automated processing. For everybody, way more mail shows up with no aligned DKIM signature on it than with a broken aligned DKIM signature on it, and no noticeable amount of that mail had a DKIM signature stripped off. In fact, for every domain I looked at, the single largest cause of DMARC fail is purely forged mail, mostly spam. The rate of messages with no aligned DKIM signature ranged from 88% (for a mailbox domain with p=none) to 2% (for a transactional domain with p=reject). For transactional domains, that mail is not readily distinguishable from pure trash. There must be a DKIM-stripping forward out there somewhere, but I haven't found it. For end-user domains, once you ignore forged spam, the major volume contributors are hosting sites, but again, the use-cases are mixed. The highest volume from hosting sites is spam. The next highest is email to site owners "From:" themselves. There are a lot of people out there not getting mail from really frequent cron jobs. Then we get bulletin boards and blog software letting you know somebody has responded to your comment, and e-commerce solutions letting you know that somebody wants to order something -- all of that shows up in one big jumble from the hosting providers. Next we get "parental control" software and other monitoring uses that send From: and To: the same address and are verbose. And large service providers for small businesses. After that, it's the land of a million different indirect uses. More e-commerce sites. The people who send you happy birthday messages from your dentist, who uses a third-party account. Or your tee-time from your golf course, ditto. Or your tanning bed schedule, because there's a service out there that does nothing but email handling for tanning salons. Printers. Security equipment. A surprising number of government agencies, including one national nuclear agency sending mail to: and from: the same person when replying to document requests or using a free email account to do government business. Services for multi-level marketing plans and other sell-to-your-friends plans. Oh, so many services for realtors. Lots of mail-to-friend features. Some of which (the ones with the most volume) are being abused for spam. Elizabeth zwi...@yahoo-inc.com
_______________________________________________ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc