Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On Wed, Feb 15, 2023 at 8:16 PM Murray S. Kucherawy wrote: > On Wed, Feb 15, 2023 at 2:21 PM Michael Thomas wrote: > >> >> There's also the question of whether "x=" is properly enforced. RFC 6376 >> says verifiers "MAY" choose to enforce it. I think I asked about this at a >> conference recently and was told that it's not universally supported by >> implementations. >> >> Others have said that the enforcement is pretty good. But I have no way >> to evaluate if that's true. >> > > I don't think we're saying different things. I remember the point of the > answer I got in that session being that most, but not all, implementations > check or enforce signature expiration. But that means if "x=" is the > solution we land on, we have to accept that a possibly-significant part of > the ecosystem won't be able to use that solution. > > Then again, anything new we roll out is going to take a while to become > universal anyway. > The short version is that x= works where it matters at the moment. As far as I've seen and heard from others, DKIM replay spam currently focuses heavily on replaying to recipients at just a few of the top 10 global mailbox providers. This is for reasons of economies of scale - roughly speaking, it might be viable to spend 1000 hours finding a way through the filters of a provider operating 200 million mailboxes, where it is not for a provider hosting 20 million. This is part of why I don't think we'll see replay attacks expand significantly to more domains; replay is just one ingredient in a larger spam recipe that takes a lot of other fine-tuning to achieve its intended effect. This has implications for rollout as well. I think the ideal solution would enable affected signers/verifiers to deal with the problem, while everyone else can ignore it entirely (until/unless they do see a problem). I think a count-based approach could do exactly that. ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On Wed, Feb 15, 2023 at 2:21 PM Michael Thomas wrote: > > There's also the question of whether "x=" is properly enforced. RFC 6376 > says verifiers "MAY" choose to enforce it. I think I asked about this at a > conference recently and was told that it's not universally supported by > implementations. > > Others have said that the enforcement is pretty good. But I have no way to > evaluate if that's true. > I don't think we're saying different things. I remember the point of the answer I got in that session being that most, but not all, implementations check or enforce signature expiration. But that means if "x=" is the solution we land on, we have to accept that a possibly-significant part of the ecosystem won't be able to use that solution. Then again, anything new we roll out is going to take a while to become universal anyway. > Going the route of some kind of duplicate signature detection alleviates > the risk of that approach, but also sort of inverts it: If you assume each > signature will only appear once, there's a window during which the first > signature works, and then a second window during which duplicates will be > blocked, but then that process recycles when the cache expires. That could > mean replays work if I just out-wait your cache. You also introduce the > risk of false positives, where a legitimate message tries to arrive in > separate envelopes with the same signature, and all but the first one get > blocked. > > I would imagine that the cache should be valid for a small x= expiry. > That's really a tuning problem on the sending domain. > Possibly. I get the impression that a good chunk of the industry would like something more from us than "you have to tune this" (i.e., something laying out specific values), but maybe we can't do anything beyond general advice because there are too many other variables at play. RFC 6647 avoided laying out specific values for greylisting, for example. > There are *tons* of external dependencies on the filtering MTA. I really > can't imagine that this would be the straw that breaks the camel's back. > Depends (as I think Scott said) on the size and resources available to the operator, and how much they're a target of this attack. We've generally shied away in the past from solutions of the form "you have to be at least this tall to play". -MSK ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On February 15, 2023 10:18:50 PM UTC, "Murray S. Kucherawy" wrote: >On Wed, Feb 15, 2023 at 5:39 AM Scott Kitterman >wrote: > >> Any reputation based solution does have down scale limits. Small mail >> sources >> (such as your random Nebraska forwarder) generally will have no reputation >> vice a negative one and so wouldn't get penalized in a scheme like the one >> I >> suggested. This does, however, highlight where the performance challenge >> is. >> We've moved it from duplicate detection to rapid assessment of reputation >> for >> hosts that have sudden volume increases. >> > >I wonder if this could be separated into "reputation" and "hosts that have >sudden volume increases". > >Reputation is hard. Large operators spend a lot of R&D time coming up with >algorithms that accurately (for some value thereof) compute the reputation >it should associate with an identity. That investment means they're not >inclined to share that secret sauce. Small operators without those >resources long for an open source solution, or a cheap or free service from >which they can reliably get reputation data. Companies that offer >reputation data for public consumption have been sued out of existence by >people that get marked as suspect, so really good ones don't seem to abound >last I checked. > >There's a lot less secret sauce involved in the latter. It would be >interesting to see if some simple recordkeeping of this nature could make a >dent in the problem space we're discussing. But that might just encourage >further distribution of the attack to avoid detection. I think it could, but it has its own scaling problems. Further distribution has two sides: If I have multiple hosts (for any of the many reasons one does) and the attacker hits all of them with some fraction of the attack volume, that doesn't materially increase the cost of the attack. If I can rapidly share rate data among my hosts so that distributing volume among them doesn't help avoid volume detection, then that either raises the cost of the attack (need more IP addresses to send from) or reduces it's effectiveness (messages blocked due to being over rate). Either of those results are good things, but whatever the process is, it's no longer simple. This is the flip side of reputation in a way. Technically easy for small domains, but hugely harder at any significant scale. Scott K ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On 2/15/23 2:12 PM, Murray S. Kucherawy wrote: On Tue, Feb 14, 2023 at 11:44 AM Michael Thomas wrote: At maximum, isn't it just the x= value? It seems to me that if you don't specify an x= value, or it's essentially infinite, they are saying they don't care about "replays". Which is fine in most cases and you can just ignore it. Something that really throttles down x= should be a tractable problem, right? Remember that the threat model is: 1) send a message through A to B, acquiring A's signature 2) collect the message from B 3) re-post the message to C, D, E, ... These days, this attack is complete within seconds. If you select an "x=" small enough to thwart this, you have to expect that all legitimate deliveries will happen even faster. But email delivery can be slow for lots of legitimate reasons. So I would argue that "x=" alone can't really solve this problem without introducing other constraints that we don't really want. I'm not saying that it solves the problem, only that it bounds how much you'd need to store. There's also the question of whether "x=" is properly enforced. RFC 6376 says verifiers "MAY" choose to enforce it. I think I asked about this at a conference recently and was told that it's not universally supported by implementations. Others have said that the enforcement is pretty good. But I have no way to evaluate if that's true. Going the route of some kind of duplicate signature detection alleviates the risk of that approach, but also sort of inverts it: If you assume each signature will only appear once, there's a window during which the first signature works, and then a second window during which duplicates will be blocked, but then that process recycles when the cache expires. That could mean replays work if I just out-wait your cache. You also introduce the risk of false positives, where a legitimate message tries to arrive in separate envelopes with the same signature, and all but the first one get blocked. I would imagine that the cache should be valid for a small x= expiry. That's really a tuning problem on the sending domain. But I mentioned in another response that if you detect lots of replays and could turn up the dial on your spam filters, that may well thwart a sizable amount of spam *and* have the ability to be retroactive with spam that has made it past the filter. But even at scale it seems like a pretty small database in comparison to the overall volume. It's would be easy for a receiver to just prune it after a day or so, say. It creates an additional external dependency on the SMTP server. I guess you have to evaluate the cost of the database versus the cost of the protection it provides, and include reasonable advice about what to do when the database is not available. There are *tons* of external dependencies on the filtering MTA. I really can't imagine that this would be the straw that breaks the camel's back. Mike ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On Wed, Feb 15, 2023 at 5:39 AM Scott Kitterman wrote: > Any reputation based solution does have down scale limits. Small mail > sources > (such as your random Nebraska forwarder) generally will have no reputation > vice a negative one and so wouldn't get penalized in a scheme like the one > I > suggested. This does, however, highlight where the performance challenge > is. > We've moved it from duplicate detection to rapid assessment of reputation > for > hosts that have sudden volume increases. > I wonder if this could be separated into "reputation" and "hosts that have sudden volume increases". Reputation is hard. Large operators spend a lot of R&D time coming up with algorithms that accurately (for some value thereof) compute the reputation it should associate with an identity. That investment means they're not inclined to share that secret sauce. Small operators without those resources long for an open source solution, or a cheap or free service from which they can reliably get reputation data. Companies that offer reputation data for public consumption have been sued out of existence by people that get marked as suspect, so really good ones don't seem to abound last I checked. There's a lot less secret sauce involved in the latter. It would be interesting to see if some simple recordkeeping of this nature could make a dent in the problem space we're discussing. But that might just encourage further distribution of the attack to avoid detection. -MSK ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On Tue, Feb 14, 2023 at 11:44 AM Michael Thomas wrote: > At maximum, isn't it just the x= value? It seems to me that if you don't > specify an x= value, or it's essentially infinite, they are saying they > don't care about "replays". Which is fine in most cases and you can just > ignore it. Something that really throttles down x= should be a tractable > problem, right? > Remember that the threat model is: 1) send a message through A to B, acquiring A's signature 2) collect the message from B 3) re-post the message to C, D, E, ... These days, this attack is complete within seconds. If you select an "x=" small enough to thwart this, you have to expect that all legitimate deliveries will happen even faster. But email delivery can be slow for lots of legitimate reasons. So I would argue that "x=" alone can't really solve this problem without introducing other constraints that we don't really want. There's also the question of whether "x=" is properly enforced. RFC 6376 says verifiers "MAY" choose to enforce it. I think I asked about this at a conference recently and was told that it's not universally supported by implementations. Going the route of some kind of duplicate signature detection alleviates the risk of that approach, but also sort of inverts it: If you assume each signature will only appear once, there's a window during which the first signature works, and then a second window during which duplicates will be blocked, but then that process recycles when the cache expires. That could mean replays work if I just out-wait your cache. You also introduce the risk of false positives, where a legitimate message tries to arrive in separate envelopes with the same signature, and all but the first one get blocked. > But even at scale it seems like a pretty small database in comparison to > the overall volume. It's would be easy for a receiver to just prune it > after a day or so, say. > It creates an additional external dependency on the SMTP server. I guess you have to evaluate the cost of the database versus the cost of the protection it provides, and include reasonable advice about what to do when the database is not available. -MSK ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On Wednesday, February 15, 2023 5:23:34 AM EST Alessandro Vesely wrote: > On Tue 14/Feb/2023 23:42:36 +0100 Scott Kitterman wrote: > > On Tuesday, February 14, 2023 4:16:00 PM EST Evan Burke wrote: > >> On Tue, Feb 14, 2023 at 11:44 AM Michael Thomas wrote: > >>> On Tue, Feb 14, 2023 at 11:18 AM Michael Thomas wrote: > Have you considered something like rate limiting on the receiver side > for > things with duplicate msg-id's? Aka, a tar pit, iirc? > >> > >> I believe Yahoo does currently use some sort of count-based approach to > >> detect replay, though I'm not clear on the details. > >> > As I recall that technique is sometimes not suggested because (a) we > can't > come up with good advice about how long you need to cache message IDs > to > watch for duplicates, and (b) the longer that cache needs to live, the > larger of a resource burden the technique imposes, and small operators > might not be able to do it well. > >>> > >>> At maximum, isn't it just the x= value? It seems to me that if you don't > >>> specify an x= value, or it's essentially infinite, they are saying they > >>> don't care about "replays". Which is fine in most cases and you can just > >>> ignore it. Something that really throttles down x= should be a tractable > >>> problem, right? > > The ration between duplicate count and x= is the spamming speed. > > >>> But even at scale it seems like a pretty small database in comparison to > >>> the overall volume. It's would be easy for a receiver to just prune it > >>> after a day or so, say. > >> > >> I think count-based approaches can be made even simpler than that, in > >> fact. > >> I'm halfway inclined to submit a draft using that approach, as time > >> permits.> > > I suppose if the thresholds are high enough, it won't hit much in the way > > of legitimate mail (as an example, I anticipate this message will hit at > > least hundreds of mail boxes at Gmail, but not millions), but of course > > letting the first X through isn't ideal. > > Scott's message hit my server exactly once. Counting is a no-op for small > operators. > > > If I had access to a database of numerically scored IP reputation values > > (I > > don't currently, but I have in the past, so I can imagine this at least), > > I > > think I'd be more inclined to look at the reputation of the domain as a > > whole (something like average score of messages from an SPF validated > > Mail From, DKIM validated d=, or DMARC pass domain) and the reputation of > > the IP for a message from that domain and then if there was sufficient > > statistical confidence that the reputation of the IP was "bad" compared > > to the domain's reputation I would infer it was likely being replayed and > > ignore the signature. > Some random forwarder in Nebraska can be easily mistaken for a spammer that > way. Reputation is affected by email volume. Even large operators have > little knowledge of almost silent MTAs. > > Having senders' signatures transmit the perceived risk of an author would > contribute an additional evaluation factor here. Rather than discard > validated signatures, have an indication to weight them. (In that respect, > let me note the usage of ARC as a sort of second class DKIM, when the > signer knows nothing about the author.) Any reputation based solution does have down scale limits. Small mail sources (such as your random Nebraska forwarder) generally will have no reputation vice a negative one and so wouldn't get penalized in a scheme like the one I suggested. This does, however, highlight where the performance challenge is. We've moved it from duplicate detection to rapid assessment of reputation for hosts that have sudden volume increases. I think that's fine as that's not at all a problem that's unique to this challenge and ultimately, I think if replay attacks end up more complicated because instead of blasting 1,000,000 messages from one host they have to trickle 1.000 messages from 1,000 hosts it's a win. I don't think this is a problem that's going to have a singular mechanical solution to that makes it go away. This is substantially about making this particular technique less effective so maybe they move on to something else or at least less bad stuff gets delivered. > > I think that approaches the same effect as a "too many dupes" approach > > without the threshold problem. It does require reputation data, but I > > assume any entity of a non-trivial size either has access to their own or > > can buy it from someone else. > > DNSWLs exist. I'm not sure how that's relevant. Please expand on this if you think it's important. Scott K ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim
Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
On Tue 14/Feb/2023 23:42:36 +0100 Scott Kitterman wrote: On Tuesday, February 14, 2023 4:16:00 PM EST Evan Burke wrote: On Tue, Feb 14, 2023 at 11:44 AM Michael Thomas wrote: On Tue, Feb 14, 2023 at 11:18 AM Michael Thomas wrote: Have you considered something like rate limiting on the receiver side for things with duplicate msg-id's? Aka, a tar pit, iirc? I believe Yahoo does currently use some sort of count-based approach to detect replay, though I'm not clear on the details. As I recall that technique is sometimes not suggested because (a) we can't come up with good advice about how long you need to cache message IDs to watch for duplicates, and (b) the longer that cache needs to live, the larger of a resource burden the technique imposes, and small operators might not be able to do it well. At maximum, isn't it just the x= value? It seems to me that if you don't specify an x= value, or it's essentially infinite, they are saying they don't care about "replays". Which is fine in most cases and you can just ignore it. Something that really throttles down x= should be a tractable problem, right? The ration between duplicate count and x= is the spamming speed. But even at scale it seems like a pretty small database in comparison to the overall volume. It's would be easy for a receiver to just prune it after a day or so, say. I think count-based approaches can be made even simpler than that, in fact. I'm halfway inclined to submit a draft using that approach, as time permits. I suppose if the thresholds are high enough, it won't hit much in the way of legitimate mail (as an example, I anticipate this message will hit at least hundreds of mail boxes at Gmail, but not millions), but of course letting the first X through isn't ideal. Scott's message hit my server exactly once. Counting is a no-op for small operators. If I had access to a database of numerically scored IP reputation values (I don't currently, but I have in the past, so I can imagine this at least), I think I'd be more inclined to look at the reputation of the domain as a whole (something like average score of messages from an SPF validated Mail From, DKIM validated d=, or DMARC pass domain) and the reputation of the IP for a message from that domain and then if there was sufficient statistical confidence that the reputation of the IP was "bad" compared to the domain's reputation I would infer it was likely being replayed and ignore the signature. Some random forwarder in Nebraska can be easily mistaken for a spammer that way. Reputation is affected by email volume. Even large operators have little knowledge of almost silent MTAs. Having senders' signatures transmit the perceived risk of an author would contribute an additional evaluation factor here. Rather than discard validated signatures, have an indication to weight them. (In that respect, let me note the usage of ARC as a sort of second class DKIM, when the signer knows nothing about the author.) I think that approaches the same effect as a "too many dupes" approach without the threshold problem. It does require reputation data, but I assume any entity of a non-trivial size either has access to their own or can buy it from someone else. DNSWLs exist. Best Ale -- ___ Ietf-dkim mailing list Ietf-dkim@ietf.org https://www.ietf.org/mailman/listinfo/ietf-dkim