[dmarc-ietf] Indirect email flows

Elizabeth Zwicky Mon, 10 Nov 2014 11:28:32 -0800

OK, so I've dived into Yahoo's incoming metadata to look at what fails DMARC 
and why. Conclusion 1: I cannot automatically tell the cases apart with any 
accuracy. Hand coding them is so time-consuming as to be beyond my ability to 
do at scale.
So, not many numbers, but I have developed some very educated opinions, which 
unfortunately take a small novel to explain.
First, transactional domains tend to have some mail that fails DMARC because of 
forwarding. The highest estimate of that I've seen is about 1.5% (that's not 
data I ran); on the domains I've looked at so far I saw rates well below 0.1%. 
Still, that's people not getting their mail. For transactional mail, this is a 
strong majority forwarding (as in I send mail to a...@b.com and it delivers to 
a...@c.com), and it's dominated by educational institutions forwarding to their 
students/alumni, followed by hosting services and ISPs. The reasons for the 
forwarding breaking the signatures clearly vary; there are popular mail systems 
that, for instance, re-mime-encode bodies, or add signature files of various 
sorts, ranging from notifications that the message has been virus scanned to 
ads for the ISP forwarding the mail. Re-encoding things can result in 
intermittent breakage, where some forwarding works (because the forwarder 
doesn't feel the need to fix it) and other messages don't. A minority is not 
straightforward, my old address routes to my new one, forwarding, but are 
services that are specialized to handling mail -- third party spam filtering or 
mail classification products that then redeliver mail. They don't make up a big 
percentage of the cases but they are notable as cases that presumably would be 
able to change their handling if we provided them with options.
Then we get to end-user domains. Although I've heard from corporate domains 
where DMARC breakage is actually lower for the humans than the transactional 
mail, for the big mailbox holders, as far as I can tell, the expected ratios 
hold; more mail fails for end-users than for transactional mail. How much more? 
That gets interesting.
For the domains at p=reject, the mail that arrives with an aligned DKIM 
signature still on it, but not working -- which is the common case for both 
forwarding and mailing lists -- is significantly under 1%. For the domains at 
p=none, that comes closer to 2%. That difference is partly in mailing lists, 
but, sadly, a noticeable amount of the difference between p=reject and p=none 
domains when it comes to the mailing list mail is spam from a few, mostly 
commercial, mailing list providers. It is clear that a number of valid, happy 
mailing lists you might like have chosen to move subscribers from p=reject to 
p=none providers, with mixed success; the high volume ones I've checked still 
have p=reject posters, at low rates. It is also clear that a lot of educational 
institutions *both* run popular, DKIM-breaking forwarders *and* run popular, 
DKIM-breaking mailing lists (and, unsurprisingly, have alumni who go through 
both at once), which is one way that these things rapidly become intractable 
for automated processing. 
For everybody, way more mail shows up with no aligned DKIM signature on it than 
with a broken aligned DKIM signature on it, and no noticeable amount of that 
mail had a DKIM signature stripped off. In fact, for every domain I looked at, 
the single largest cause of DMARC fail is purely forged mail, mostly spam. The 
rate of messages with no aligned DKIM signature ranged from 88% (for a mailbox 
domain with p=none) to 2% (for a transactional domain with p=reject). For 
transactional domains, that mail is not readily distinguishable from pure 
trash. There must be a DKIM-stripping forward out there somewhere, but I 
haven't found it. For end-user domains, once you ignore forged spam, the major 
volume contributors are hosting sites, but again, the use-cases are mixed. The 
highest volume from hosting sites is spam. The next highest is email to site 
owners "From:" themselves. There are a lot of people out there not getting mail 
from really frequent cron jobs. Then we get bulletin boards and blog software 
letting you know somebody has responded to your comment, and e-commerce 
solutions letting you know that somebody wants to order something -- all of 
that shows up in one big jumble from the hosting providers. 
Next we get "parental control" software and other monitoring uses that send 
From: and To: the same address and are verbose. And large service providers for 
small businesses.
After that, it's the land of a million different indirect uses. More e-commerce 
sites. The people who send you happy birthday messages from your dentist, who 
uses a third-party account. Or your tee-time from your golf course, ditto. Or 
your tanning bed schedule, because there's a service out there that does 
nothing but email handling for tanning salons. Printers. Security equipment. A 
surprising number of government agencies, including one national nuclear agency 
sending mail to: and from: the same person when replying to document requests 
or using a free email account to do government business. Services for 
multi-level marketing plans and other sell-to-your-friends plans. Oh, so many 
services for realtors. Lots of mail-to-friend features. Some of which (the ones 
with the most volume) are being abused for spam.
    Elizabeth
    zwi...@yahoo-inc.com

_______________________________________________
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc

[dmarc-ietf] Indirect email flows

Reply via email to