[admin] RE: the O(N^2) problem
Folks, Same request as the Yahoo! Mail thread, can we go ahead and wrap this up? Excellent points, intelligent positions, but definitely not operational. This one might be great for ASRG, which has been a little more active lately. Best Regards, Marty -- Martin Hannigan http://www.verneglobal.com/ Verne Global Datacenters e: [EMAIL PROTECTED] Keflavik, Icelandp: +16178216079
Re: the O(N^2) problem
On Mon, 14 Apr 2008, Edward B. DREGER wrote: > > When it comes to establishing trust: > > * The current SMTP model is O(N^2); In practice it's O(N): small-to-medium-sized email systems rely on external reputation providers (blacklists or anti-spam service providers) rather than creating their own reputation databases. > * PKI models are pretty much O(1). PKI gives you authentication but not reputation. The cryptographic notion of "trust" is not useful in the context of spam. > Here's what I propose: > > Establish a "distrust protocol". Let path weight be "distrust". The > "trust path" is of secondary importance to "path weight", although not > completely irrelevant. SMTP endpoint not in graph? Fine; have some > default behavior. > > Let _trust_ be semi-transitive, a la BGP -- a technology that we know, > understand, and at least sort of trust to run this crazy, giant network > that dwarfs even a 50M-user provider. Sadly this doesn't work. BGP's transitive trust does not prevent spammers and hackers and other bad actors from getting IP connectivity. NNTP's transitive trust doesn't prevent spammers from getting usenet connectivity. You can't go much further than a hop or two before measures of reputation become too diluted to be useful. This is partly because the diameter of the network is quite small, so you rapidly end up with a huge population that only has a reasonable reputation on average. (Much like large providers are only reasonably non-spammy - they're too big to be squeaky clean.) Note that the model of anti-spam service providers and reputation providers is effectively a two-hop model. Tony. -- f.anthony.n.finch <[EMAIL PROTECTED]> http://dotat.at/ HEBRIDES: SOUTHWEST VEERING NORTHWEST 5 OR 6, THEN EAST 4 OR 5. MODERATE. RAIN OR SHOWERS. MODERATE OR GOOD.
Re: the O(N^2) problem
On Mon, Apr 14, 2008 at 01:41:50PM +, Edward B. DREGER wrote: > When one accepts an email[*], one wishes for some sort of _a priori_ > information regarding message trustworthiness. DKIM can vouch for > message authenticity, but not trust. At the moment, this problem can't be solved on an Internet scale, because there are on the order of 10e8 fully-compromised systems out there. Many different estimates have been proferred over the years; the most recent I've seen is from Rick Wesson at Support Intelligence, who offered 40% as his guesstimate; if there are 800M systems on the 'net, that'd be about 320M. But the exact number is unknowable and in some sense unimportant: the difference between 128M and 172M doesn't matter for the purpose of this discussion. And I believe there is widespread concurrence that whatever the number is, it's going up. The new owners of those systems can do anything with them they want, including forging (and cryptographically signing) outbound mail messages using any SMTP authorization credentials present on it, or any SMTP access implied by its network location(s). (They can also, if they wish, arrange to conceal incoming replies to this traffic from the former owners.) Until that problem's solved (and I don't see any solution for it on the horizon) then it will undercut any number of interesting approaches worthy of significant discussion, not just this one. It's the elephant in the room, and until it's banished, it will keep getting in the way. ---Rsk
Re: the O(N^2) problem
> The risk in a reputation system is collusion. /One/ risk in a reputation system is collusion. Reputation is a method to try to divine legitimacy of mail based on factors other than whether or not a recipient authorized a sender to send mail. To a large extent, the majority of the focus on fighting spam has been to try to do this sort of divination by coding clever things into machines, but it should be clear to anyone who has ever had legitimate mail mysteriously go missing, undelivered, or delayed that the process isn't without the occasional falsing. There are both positive (whitelist) and negative (DNSBL, local This-Is-Spam, etc) reputation lists, and there are pros and cons to each. Consider, for example, Kevin Day's example of the Group-B-Objectionable scenario. This is a nonobvious issue that can subvert the reputation of a legitimate mailer. On the flip side, what about someone who actually wants to receive mail that an organization such as Spamhaus has deemed to be hosted on a spammy IP? (And, Steve and the Spamhaus guys, this is in no way a criticism of the job you guys do, the Internet owes you a debt of gratitude for doing a nearly impossible job in such a professional manner) There are risks inherent with having any third party, specifically including the ISP or mailbox provider, trying to determine the nature of the communications, and filtering on that basis. This is why I've been talking about paradigms that eliminate the need for third parties to do analysis of e-mail, and rely on the third parties to simply implement systems that allow the recipient to control mail. There are a number of such systems that are possible. However, the current systems of divining legitimacy (reputation, filtering, whatever) generate results that loosely approximate the typical mail that the average user would wish to receive. Users have been trained to consider errors in the process as acceptable, and even unavoidable. It's ridiculous when systems like Hotmail silently bitbucket e-mail from a sender (and IP) that has never spammed, and have ONLY sent transactional e-mail and customer support correspondence, and the individually composed non-HTML REPLIES to customer inquiries are eaten by Hotmail, or tossed in the spam folder. Nice. (I know, we all have our stories) ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
Re: the O(N^2) problem
I received an off-list request: "Could you clarify what precisely you are trying to secure?" I fear that perhaps I am still too vague. When one accepts an email[*], one wishes for some sort of _a priori_ information regarding message trustworthiness. DKIM can vouch for message authenticity, but not trust. (A valid DKIM signature shows that selected headers/content have not been forged, but does not vouch for content.) If I receive email from someone I trust, there's a good chance it's something I want. If from someone who someone I trust trusts, there's still a good chance. As the chain lengthens, trust becomes a bit dicier. What I propose is orthogonal to DKIM. I've also been asked to set up a separate mailing list. I'll do that, and stop pollu^H^H^H^H^Htrying to elaborate on NANOG. [*] Discussion limited to one example, but could be expanded. Eddy -- Everquick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita DO NOT send mail to the following addresses: [EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked. Ditto for broken OOO autoresponders and foolish AV software backscatter.
Re: the O(N^2) problem
On Mon, Apr 14, 2008 at 11:50 AM, Steven M. Bellovin <[EMAIL PROTECTED]> wrote: > The risk in a reputation system is collusion. Multiple reputation systems, each with their own reputation .. Sed quis custodiet ipsos custodes and all that .. A lot of the "reputation" (aka "positive reputation") shall we say work is heavily sender / ESP / bulk mailer etc driven. And the negative reputation stuff (blocklists like spamhaus etc) have been around rather a long time. So quite a few ISPs tend to rely on trusted negative reputation systems (aka they'd use spamhaus) and build positive reputation (whitelists) on their own, possibly tying this to auth systems such as dkim. --srs -- Suresh Ramasubramanian ([EMAIL PROTECTED])
Re: the O(N^2) problem
The risk in a reputation system is collusion.
Re: the O(N^2) problem
On Mon, Apr 14, 2008 at 11:27 AM, Edward B. DREGER <[EMAIL PROTECTED]> wrote: > For such a system to scale, it would need to avoid OSPF-style > convergence. Similarly, I would not want to query, for the sake of > example, 15k different "trust peers" each time I needed to validate a > new tuple. (Hence the interdomain routing and d-v calc > references.) And dkim layered with some kind of reputation (if only a locally built whitelist) wont scale for this?
Re: the O(N^2) problem
Stardate Mon, 14 Apr 2008, Suresh Ramasubramanian's log: SR> From: Suresh Ramasubramanian SR> Looks like what various people in the industry call a "reputation SR> system" I started responding; Suresh's reply came as I was doing so, and put it very succinctly. Reputation system, but inter-"network". Perhaps an example would work better than my vague descriptions. :-) Let's say I receive email from: From: <[EMAIL PROTECTED]> Received: from ... (owen.delong.sj.ca.us [192.159.10.2]) Should I trust the message? I don't "know" you. However, I _do_ know From: <[EMAIL PROTECTED]> Received: from ... (trapdoor.merit.edu [198.108.1.26]) and trapdoor.merit.edu vouches for you. Elaborating, using "trust paths", *not* SMTP or routing paths: <[EMAIL PROTECTED]> # distrust metric: initially 0 owen.delong.sj.ca.us # distrust metric: still 0 trapdoor.merit.edu # dm: 1 (it mostly believes your MX) mail.everquick.net # dm: 2 (more or less trust NANOG) versus <[EMAIL PROTECTED]> # dm: 0 malicious.host.domain.tld # dm: 0 (trying to impersonate) trapdoor.merit.edu # dm: 10 (doesn't yet trust above host) mail.everquick.net # dm: 16 (after whatever local mods) or <[EMAIL PROTECTED]> # dm: 0 owen.delong.sj.ca.us # dm: 0 (your MX can vouch) trapdoor.merit.edu # dm: 1 (more or less believes your MX) mail.everquick.net # dm: 2 (more or less trust NANOG) IOW, I receive email from an unrecognized address from your MX. Do I trust it? I mostly trust trapdoor.merit.edu, which mostly trusts your MX, which totally trusts <[EMAIL PROTECTED]>. Therefore, I conclude that I largely trust the message. For such a system to scale, it would need to avoid OSPF-style convergence. Similarly, I would not want to query, for the sake of example, 15k different "trust peers" each time I needed to validate a new tuple. (Hence the interdomain routing and d-v calc references.) Therefore, one would find the locally-optimal solution at each "trust hop", a la interdomain routing. Perhaps PGP/GPG would be the best analogy. (When I said "PKI", I should have stated CA-based PKI; my original wording was excessively broad, and should have explicitly excluded PGP.) Eddy -- Everquick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita DO NOT send mail to the following addresses: [EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked. Ditto for broken OOO autoresponders and foolish AV software backscatter.
Re: the O(N^2) problem
On Mon, Apr 14, 2008 at 10:34 AM, Owen DeLong <[EMAIL PROTECTED]> wrote: > Now I'm lost again. You've mixed so many different metaphors from > interdomain routing to distance-vector computaton to store-and-forward > that I simply don't understand what you are proposing or how one > could begin to approach implementing it or what problem you seem > to think it solves (although it sort of seems like you're wanting to attack > the trustworthiness of email to battle SPAM through some mechanism > that depends only on the level of trust for the (source, arrival path) > tuple from whence it came. Looks like what various people in the industry call a "reputation system"
Re: the O(N^2) problem
On Apr 13, 2008, at 5:36 PM, Edward B. DREGER wrote: Bottom line first: We need OOB metadata ("trust/distrust") information exchange that scales better than the current O(N^2) nonsense, yet is not PKI. Not sure why PKI should be excluded, but, so far, this is too abstract to know what the question is... And now, the details... which ended up longer reading than I intended. My apologies. As Mark Twain said, "I didn't have time to write a short letter, so I wrote a long one instead." :-) When it comes to establishing trust: * The current SMTP model is O(N^2); I don't see SMTP as even a "trust" model since there's pretty much nothing trustworthy in SMTP. * I posit that the current IP networking model is sub-O(N); Again, I'm not seeing IP as a trust model, but, YMMV. * PKI models are pretty much O(1). Polynomial-order just doesn't scale well. It's mathematical fact, and particularly painful when the independent variable is still increasing quickly. Sure. Many operators seem to reject PKI as "power in too few hands". I'll not disagree with that. Depends on the PKI. For example, the PGP/GPG Web of Trust concept pretty much lets each individual build their own trust model to whatever O(x) they choose where greater values of x require more effort and also provide greater security/trust granularity and lower values of x involve greater trust of others that you claim you can trust and less direct effort on your part. Let's also draw upon operational lessons from a couple old-timers. I recall using a critter known as "NNTP". And once upon a time, before my days on the Internet, lived a funny little beast called "UUCP". I remember UUCP. It was pretty much O(n^2). We track email quality from all mailservers that hit us. I can whip up a list of MXes/organizations that I'm willing to "trust" -- and let's leave that term imprecisely-defined for now. Uh, OK. Starting to understand what the question might be aiming towards. Here's what I propose: Establish a "distrust protocol". Let path weight be "distrust". The "trust path" is of secondary importance to "path weight", although not completely irrelevant. SMTP endpoint not in graph? Fine; have some default behavior. Let _trust_ be semi-transitive, a la BGP -- a technology that we know, understand, and at least sort of trust to run this crazy, giant network that dwarfs even a 50M-user provider. Let actual _content_ still be end-to-end, so that we do not simply reincarnate NNTP or UUCP. Now I'm lost again. You've mixed so many different metaphors from interdomain routing to distance-vector computaton to store-and-forward that I simply don't understand what you are proposing or how one could begin to approach implementing it or what problem you seem to think it solves (although it sort of seems like you're wanting to attack the trustworthiness of email to battle SPAM through some mechanism that depends only on the level of trust for the (source, arrival path) tuple from whence it came. What am I missing? Owen
Re: the O(N^2) problem
Another alternative is something we've been working on that we call Perspectives: http://www.cs.cmu.edu/~dwendlan/perspectives/ Warning: This is a work in progress. The Mozilla plugin is a little flaky and the paper is still being revised for the final revision for USENIX. The SSH code works pretty well. We haven't written an SMTP version yet. The basic idea is pretty simple: Use multiple paths to a destination to figure out if you're likely getting to the right place. If you _and_ your friends all observe the same public key from a server, preferably for a long period of time, then trust it. Else scream bloody murder. Perspectives provides these "friends" in the form of notary servers who you can ask about the past and present keys supplied by an SSL or SSH server. (An alternate way of viewing this is to think of Perspectives as a low- overhead, low-cost PKI.) It's an interesting thought exercise to wonder if we could extend the model to "trust not to send spam" instead of simply "trust to be the owner of the key", but that same exercise applies with a PKI, too. -Dave On Apr 13, 2008, at 8:36 PM, Edward B. DREGER wrote: Bottom line first: We need OOB metadata ("trust/distrust") information exchange that scales better than the current O(N^2) nonsense, yet is not PKI. And now, the details... which ended up longer reading than I intended. My apologies. As Mark Twain said, "I didn't have time to write a short letter, so I wrote a long one instead." :-) When it comes to establishing trust: * The current SMTP model is O(N^2); * I posit that the current IP networking model is sub-O(N); * PKI models are pretty much O(1). Polynomial-order just doesn't scale well. It's mathematical fact, and particularly painful when the independent variable is still increasing quickly. Many operators seem to reject PKI as "power in too few hands". I'll not disagree with that. Conclusion: What we need is something that scales better than O(N^2), but that is not as "few trusted keepers of the world" as PKI. Let's look to one of the current hot tickets: social networking. Who is whose friend, who is in whose network, blah blah blah. (The junior high students seem to grok the concept of trust being semi-transitive!) Let's also draw upon operational lessons from a couple old-timers. I recall using a critter known as "NNTP". And once upon a time, before my days on the Internet, lived a funny little beast called "UUCP". We track email quality from all mailservers that hit us. I can whip up a list of MXes/organizations that I'm willing to "trust" -- and let's leave that term imprecisely-defined for now. Here's what I propose: Establish a "distrust protocol". Let path weight be "distrust". The "trust path" is of secondary importance to "path weight", although not completely irrelevant. SMTP endpoint not in graph? Fine; have some default behavior. Let _trust_ be semi-transitive, a la BGP -- a technology that we know, understand, and at least sort of trust to run this crazy, giant network that dwarfs even a 50M-user provider. Let actual _content_ still be end-to-end, so that we do not simply reincarnate NNTP or UUCP. Alternatively: I'm open to other suggestions. Or, there's plan "C": We continue to argue, banter, carp, fuss, grumble, moan, swear, whine, et cetera (I decided against running the alphabet) over the problem. Hey, it's worked/working great so far, right?
the O(N^2) problem
Bottom line first: We need OOB metadata ("trust/distrust") information exchange that scales better than the current O(N^2) nonsense, yet is not PKI. And now, the details... which ended up longer reading than I intended. My apologies. As Mark Twain said, "I didn't have time to write a short letter, so I wrote a long one instead." :-) When it comes to establishing trust: * The current SMTP model is O(N^2); * I posit that the current IP networking model is sub-O(N); * PKI models are pretty much O(1). Polynomial-order just doesn't scale well. It's mathematical fact, and particularly painful when the independent variable is still increasing quickly. Many operators seem to reject PKI as "power in too few hands". I'll not disagree with that. Conclusion: What we need is something that scales better than O(N^2), but that is not as "few trusted keepers of the world" as PKI. Let's look to one of the current hot tickets: social networking. Who is whose friend, who is in whose network, blah blah blah. (The junior high students seem to grok the concept of trust being semi-transitive!) Let's also draw upon operational lessons from a couple old-timers. I recall using a critter known as "NNTP". And once upon a time, before my days on the Internet, lived a funny little beast called "UUCP". We track email quality from all mailservers that hit us. I can whip up a list of MXes/organizations that I'm willing to "trust" -- and let's leave that term imprecisely-defined for now. Here's what I propose: Establish a "distrust protocol". Let path weight be "distrust". The "trust path" is of secondary importance to "path weight", although not completely irrelevant. SMTP endpoint not in graph? Fine; have some default behavior. Let _trust_ be semi-transitive, a la BGP -- a technology that we know, understand, and at least sort of trust to run this crazy, giant network that dwarfs even a 50M-user provider. Let actual _content_ still be end-to-end, so that we do not simply reincarnate NNTP or UUCP. Alternatively: I'm open to other suggestions. Or, there's plan "C": We continue to argue, banter, carp, fuss, grumble, moan, swear, whine, et cetera (I decided against running the alphabet) over the problem. Hey, it's worked/working great so far, right? Eddy -- Everquick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita DO NOT send mail to the following addresses: [EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked. Ditto for broken OOO autoresponders and foolish AV software backscatter.