[admin] RE: the O(N^2) problem

2008-04-14 Thread Martin Hannigan



Folks,

Same request as the Yahoo! Mail thread, can we go ahead and wrap this
up? Excellent points, intelligent positions, but definitely not
operational. This one might be great for ASRG, which has been a little
more active lately.

Best Regards,

Marty


--
Martin Hannigan  http://www.verneglobal.com/
Verne Global Datacenters e: [EMAIL PROTECTED]
Keflavik, Icelandp: +16178216079


Re: the O(N^2) problem

2008-04-14 Thread Tony Finch

On Mon, 14 Apr 2008, Edward B. DREGER wrote:
>
> When it comes to establishing trust:
>
> * The current SMTP model is O(N^2);

In practice it's O(N): small-to-medium-sized email systems rely on
external reputation providers (blacklists or anti-spam service providers)
rather than creating their own reputation databases.

> * PKI models are pretty much O(1).

PKI gives you authentication but not reputation. The cryptographic notion
of "trust" is not useful in the context of spam.

> Here's what I propose:
>
> Establish a "distrust protocol".  Let path weight be "distrust".  The
> "trust path" is of secondary importance to "path weight", although not
> completely irrelevant.  SMTP endpoint not in graph?  Fine; have some
> default behavior.
>
> Let _trust_ be semi-transitive, a la BGP -- a technology that we know,
> understand, and at least sort of trust to run this crazy, giant network
> that dwarfs even a 50M-user provider.

Sadly this doesn't work. BGP's transitive trust does not prevent spammers
and hackers and other bad actors from getting IP connectivity. NNTP's
transitive trust doesn't prevent spammers from getting usenet
connectivity.

You can't go much further than a hop or two before measures of reputation
become too diluted to be useful. This is partly because the diameter of
the network is quite small, so you rapidly end up with a huge population
that only has a reasonable reputation on average. (Much like large
providers are only reasonably non-spammy - they're too big to be squeaky
clean.) Note that the model of anti-spam service providers and reputation
providers is effectively a two-hop model.

Tony.
-- 
f.anthony.n.finch  <[EMAIL PROTECTED]>  http://dotat.at/
HEBRIDES: SOUTHWEST VEERING NORTHWEST 5 OR 6, THEN EAST 4 OR 5. MODERATE. RAIN
OR SHOWERS. MODERATE OR GOOD.


Re: the O(N^2) problem

2008-04-14 Thread Rich Kulawiec

On Mon, Apr 14, 2008 at 01:41:50PM +, Edward B. DREGER wrote:
> When one accepts an email[*], one wishes for some sort of _a priori_
> information regarding message trustworthiness.  DKIM can vouch for
> message authenticity, but not trust.  

At the moment, this problem can't be solved on an Internet scale, because
there are on the order of 10e8 fully-compromised systems out there.  Many
different estimates have been proferred over the years; the most recent
I've seen is from Rick Wesson at Support Intelligence, who offered 40%
as his guesstimate; if there are 800M systems on the 'net, that'd be about
320M.  But the exact number is unknowable and in some sense unimportant:
the difference between 128M and 172M doesn't matter for the purpose of
this discussion.  And I believe there is widespread concurrence that
whatever the number is, it's going up.

The new owners of those systems can do anything with them they want,
including forging (and cryptographically signing) outbound mail messages
using any SMTP authorization credentials present on it, or any SMTP access
implied by its network location(s).  (They can also, if they wish, arrange
to conceal incoming replies to this traffic from the former owners.)

Until that problem's solved (and I don't see any solution for it on
the horizon) then it will undercut any number of interesting approaches
worthy of significant discussion, not just this one.  It's the elephant
in the room, and until it's banished, it will keep getting in the way.

---Rsk



Re: the O(N^2) problem

2008-04-14 Thread Joe Greco

> The risk in a reputation system is collusion.

/One/ risk in a reputation system is collusion.

Reputation is a method to try to divine legitimacy of mail based on factors
other than whether or not a recipient authorized a sender to send mail.  To
a large extent, the majority of the focus on fighting spam has been to try
to do this sort of divination by coding clever things into machines, but it 
should be clear to anyone who has ever had legitimate mail mysteriously go 
missing, undelivered, or delayed that the process isn't without the
occasional falsing.

There are both positive (whitelist) and negative (DNSBL, local This-Is-Spam,
etc) reputation lists, and there are pros and cons to each.

Consider, for example, Kevin Day's example of the Group-B-Objectionable
scenario.  This is a nonobvious issue that can subvert the reputation of
a legitimate mailer.

On the flip side, what about someone who actually wants to receive mail
that an organization such as Spamhaus has deemed to be hosted on a spammy
IP?  (And, Steve and the Spamhaus guys, this is in no way a criticism of
the job you guys do, the Internet owes you a debt of gratitude for doing
a nearly impossible job in such a professional manner)

There are risks inherent with having any third party, specifically
including the ISP or mailbox provider, trying to determine the nature of
the communications, and filtering on that basis.

This is why I've been talking about paradigms that eliminate the need for
third parties to do analysis of e-mail, and rely on the third parties to
simply implement systems that allow the recipient to control mail.  There
are a number of such systems that are possible.

However, the current systems of divining legitimacy (reputation, filtering,
whatever) generate results that loosely approximate the typical mail that
the average user would wish to receive.  Users have been trained to consider
errors in the process as acceptable, and even unavoidable.

It's ridiculous when systems like Hotmail silently bitbucket e-mail from
a sender (and IP) that has never spammed, and have ONLY sent transactional
e-mail and customer support correspondence, and the individually composed
non-HTML REPLIES to customer inquiries are eaten by Hotmail, or tossed in
the spam folder.  Nice.  (I know, we all have our stories)

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


Re: the O(N^2) problem

2008-04-14 Thread Edward B. DREGER

I received an off-list request: "Could you clarify what precisely you
are trying to secure?"  I fear that perhaps I am still too vague.

When one accepts an email[*], one wishes for some sort of _a priori_
information regarding message trustworthiness.  DKIM can vouch for
message authenticity, but not trust.  (A valid DKIM signature shows that
selected headers/content have not been forged, but does not vouch for
content.)

If I receive email from someone I trust, there's a good chance it's
something I want.  If from someone who someone I trust trusts, there's
still a good chance.  As the chain lengthens, trust becomes a bit
dicier.

What I propose is orthogonal to DKIM.

I've also been asked to set up a separate mailing list.  I'll do that,
and stop pollu^H^H^H^H^Htrying to elaborate on NANOG.

[*] Discussion limited to one example, but could be expanded.


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.


Re: the O(N^2) problem

2008-04-14 Thread Suresh Ramasubramanian

On Mon, Apr 14, 2008 at 11:50 AM, Steven M. Bellovin
<[EMAIL PROTECTED]> wrote:
> The risk in a reputation system is collusion.

Multiple reputation systems, each with their own reputation ..  Sed
quis custodiet ipsos custodes and all that ..

A lot of the "reputation" (aka "positive reputation") shall we say
work is heavily sender / ESP / bulk mailer etc driven.  And the
negative reputation stuff (blocklists like spamhaus etc) have been
around rather a long time.

So quite a few ISPs tend to rely on trusted negative reputation
systems (aka they'd use spamhaus) and build positive reputation
(whitelists) on their own, possibly tying this to auth systems such as
dkim.

--srs
-- 
Suresh Ramasubramanian ([EMAIL PROTECTED])


Re: the O(N^2) problem

2008-04-13 Thread Steven M. Bellovin

The risk in a reputation system is collusion.


Re: the O(N^2) problem

2008-04-13 Thread Suresh Ramasubramanian

On Mon, Apr 14, 2008 at 11:27 AM, Edward B. DREGER
<[EMAIL PROTECTED]> wrote:
>  For such a system to scale, it would need to avoid OSPF-style
>  convergence.  Similarly, I would not want to query, for the sake of
>  example, 15k different "trust peers" each time I needed to validate a
>  new  tuple.  (Hence the interdomain routing and d-v calc
>  references.)

And dkim layered with some kind of reputation (if only a locally built
whitelist) wont scale for this?


Re: the O(N^2) problem

2008-04-13 Thread Edward B. DREGER

Stardate Mon, 14 Apr 2008, Suresh Ramasubramanian's log:
SR> From: Suresh Ramasubramanian

SR> Looks like what various people in the industry call a "reputation
SR> system"

I started responding; Suresh's reply came as I was doing so, and put it
very succinctly.  Reputation system, but inter-"network".  Perhaps an
example would work better than my vague descriptions. :-)

Let's say I receive email from:

From: <[EMAIL PROTECTED]>
Received: from ... (owen.delong.sj.ca.us [192.159.10.2])

Should I trust the message?  I don't "know" you.  However, I _do_ know

From: <[EMAIL PROTECTED]>
Received: from ... (trapdoor.merit.edu [198.108.1.26])

and trapdoor.merit.edu vouches for you.  Elaborating, using "trust
paths", *not* SMTP or routing paths:

<[EMAIL PROTECTED]> # distrust metric: initially 0
owen.delong.sj.ca.us # distrust metric: still 0
trapdoor.merit.edu # dm: 1 (it mostly believes your MX)
mail.everquick.net # dm: 2 (more or less trust NANOG)

versus

<[EMAIL PROTECTED]> # dm: 0
malicious.host.domain.tld # dm: 0 (trying to impersonate)
trapdoor.merit.edu # dm: 10 (doesn't yet trust above host)
mail.everquick.net # dm: 16 (after whatever local mods)

or

<[EMAIL PROTECTED]> # dm: 0
owen.delong.sj.ca.us # dm: 0 (your MX can vouch)
trapdoor.merit.edu # dm: 1 (more or less believes your MX)
mail.everquick.net # dm: 2 (more or less trust NANOG)

IOW, I receive email from an unrecognized address from your MX.  Do I
trust it?  I mostly trust trapdoor.merit.edu, which mostly trusts your
MX, which totally trusts <[EMAIL PROTECTED]>.  Therefore, I conclude
that I largely trust the message.

For such a system to scale, it would need to avoid OSPF-style
convergence.  Similarly, I would not want to query, for the sake of
example, 15k different "trust peers" each time I needed to validate a
new  tuple.  (Hence the interdomain routing and d-v calc
references.)

Therefore, one would find the locally-optimal solution at each "trust
hop", a la interdomain routing.

Perhaps PGP/GPG would be the best analogy.  (When I said "PKI", I should
have stated CA-based PKI; my original wording was excessively broad, and
should have explicitly excluded PGP.)


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.


Re: the O(N^2) problem

2008-04-13 Thread Suresh Ramasubramanian

On Mon, Apr 14, 2008 at 10:34 AM, Owen DeLong <[EMAIL PROTECTED]> wrote:

> Now I'm lost again.  You've mixed so many different metaphors from
> interdomain routing to distance-vector computaton to store-and-forward
> that I simply don't understand what you are proposing or how one
> could begin to approach implementing it or what problem you seem
> to think it solves (although it sort of seems like you're wanting to attack
> the trustworthiness of email to battle SPAM through some mechanism
> that depends only on the level of trust for the (source, arrival path)
> tuple from whence it came.

Looks like what various people in the industry call a "reputation system"


Re: the O(N^2) problem

2008-04-13 Thread Owen DeLong


On Apr 13, 2008, at 5:36 PM, Edward B. DREGER wrote:



Bottom line first:

We need OOB metadata ("trust/distrust") information exchange that  
scales

better than the current O(N^2) nonsense, yet is not PKI.


Not sure why PKI should be excluded, but, so far, this is too abstract
to know what the question is...


And now, the details... which ended up longer reading than I intended.
My apologies.  As Mark Twain said, "I didn't have time to write a  
short

letter, so I wrote a long one instead." :-)

When it comes to establishing trust:

* The current SMTP model is O(N^2);


I don't see SMTP as even a "trust" model since there's pretty much
nothing trustworthy in SMTP.


* I posit that the current IP networking model is sub-O(N);


Again, I'm not seeing IP as a trust model, but, YMMV.


* PKI models are pretty much O(1).

Polynomial-order just doesn't scale well.  It's mathematical fact, and
particularly painful when the independent variable is still increasing
quickly.


Sure.

Many operators seem to reject PKI as "power in too few hands".  I'll  
not

disagree with that.


Depends on the PKI.  For example, the PGP/GPG Web of Trust concept
pretty much lets each individual build their own trust model to whatever
O(x) they choose where greater values of x require more effort and also
provide greater security/trust granularity and lower values of x involve
greater trust of others that you claim you can trust and less direct  
effort

on your part.


Let's also draw upon operational lessons from a couple old-timers.  I
recall using a critter known as "NNTP".  And once upon a time,  
before my

days on the Internet, lived a funny little beast called "UUCP".


I remember UUCP.  It was pretty much O(n^2).

We track email quality from all mailservers that hit us.  I can whip  
up

a list of MXes/organizations that I'm willing to "trust" -- and let's
leave that term imprecisely-defined for now.


Uh, OK.  Starting to understand what the question might be aiming
towards.


Here's what I propose:

Establish a "distrust protocol".  Let path weight be "distrust".  The
"trust path" is of secondary importance to "path weight", although not
completely irrelevant.  SMTP endpoint not in graph?  Fine; have some
default behavior.

Let _trust_ be semi-transitive, a la BGP -- a technology that we know,
understand, and at least sort of trust to run this crazy, giant  
network

that dwarfs even a 50M-user provider.

Let actual _content_ still be end-to-end, so that we do not simply
reincarnate NNTP or UUCP.


Now I'm lost again.  You've mixed so many different metaphors from
interdomain routing to distance-vector computaton to store-and-forward
that I simply don't understand what you are proposing or how one
could begin to approach implementing it or what problem you seem
to think it solves (although it sort of seems like you're wanting to  
attack

the trustworthiness of email to battle SPAM through some mechanism
that depends only on the level of trust for the (source, arrival path)
tuple from whence it came.

What am I missing?

Owen



Re: the O(N^2) problem

2008-04-13 Thread David Andersen


Another alternative is something we've been working on that we call  
Perspectives:


http://www.cs.cmu.edu/~dwendlan/perspectives/

Warning:  This is a work in progress.  The Mozilla plugin is a little  
flaky and the paper is still being revised for the final revision for  
USENIX.  The SSH code works pretty well.  We haven't written an SMTP  
version yet.


The basic idea is pretty simple:  Use multiple paths to a destination  
to figure out if you're likely getting to the right place.  If you  
_and_ your friends all observe the same public key from a server,  
preferably for a long period of time, then trust it.  Else scream  
bloody murder.  Perspectives provides these "friends" in the form of  
notary servers who you can ask about the past and present keys  
supplied by an SSL or SSH server.


(An alternate way of viewing this is to think of Perspectives as a low- 
overhead, low-cost PKI.)


It's an interesting thought exercise to wonder if we could extend the  
model to "trust not to send spam" instead of simply "trust to be the  
owner of the key", but that same exercise applies with a PKI, too.


  -Dave

On Apr 13, 2008, at 8:36 PM, Edward B. DREGER wrote:



Bottom line first:

We need OOB metadata ("trust/distrust") information exchange that  
scales

better than the current O(N^2) nonsense, yet is not PKI.

And now, the details... which ended up longer reading than I intended.
My apologies.  As Mark Twain said, "I didn't have time to write a  
short

letter, so I wrote a long one instead." :-)

When it comes to establishing trust:

* The current SMTP model is O(N^2);

* I posit that the current IP networking model is sub-O(N);

* PKI models are pretty much O(1).

Polynomial-order just doesn't scale well.  It's mathematical fact, and
particularly painful when the independent variable is still increasing
quickly.

Many operators seem to reject PKI as "power in too few hands".  I'll  
not

disagree with that.

Conclusion:  What we need is something that scales better than O(N^2),
but that is not as "few trusted keepers of the world" as PKI.

Let's look to one of the current hot tickets: social networking.   
Who is
whose friend, who is in whose network, blah blah blah.  (The junior  
high

students seem to grok the concept of trust being semi-transitive!)

Let's also draw upon operational lessons from a couple old-timers.  I
recall using a critter known as "NNTP".  And once upon a time,  
before my

days on the Internet, lived a funny little beast called "UUCP".

We track email quality from all mailservers that hit us.  I can whip  
up

a list of MXes/organizations that I'm willing to "trust" -- and let's
leave that term imprecisely-defined for now.

Here's what I propose:

Establish a "distrust protocol".  Let path weight be "distrust".  The
"trust path" is of secondary importance to "path weight", although not
completely irrelevant.  SMTP endpoint not in graph?  Fine; have some
default behavior.

Let _trust_ be semi-transitive, a la BGP -- a technology that we know,
understand, and at least sort of trust to run this crazy, giant  
network

that dwarfs even a 50M-user provider.

Let actual _content_ still be end-to-end, so that we do not simply
reincarnate NNTP or UUCP.

Alternatively:

I'm open to other suggestions.

Or, there's plan "C":

We continue to argue, banter, carp, fuss, grumble, moan, swear, whine,
et cetera (I decided against running the alphabet) over the problem.
Hey, it's worked/working great so far, right?





the O(N^2) problem

2008-04-13 Thread Edward B. DREGER

Bottom line first:

We need OOB metadata ("trust/distrust") information exchange that scales
better than the current O(N^2) nonsense, yet is not PKI.

And now, the details... which ended up longer reading than I intended.
My apologies.  As Mark Twain said, "I didn't have time to write a short
letter, so I wrote a long one instead." :-)

When it comes to establishing trust:

* The current SMTP model is O(N^2);

* I posit that the current IP networking model is sub-O(N);

* PKI models are pretty much O(1).

Polynomial-order just doesn't scale well.  It's mathematical fact, and
particularly painful when the independent variable is still increasing
quickly.

Many operators seem to reject PKI as "power in too few hands".  I'll not
disagree with that.

Conclusion:  What we need is something that scales better than O(N^2),
but that is not as "few trusted keepers of the world" as PKI.

Let's look to one of the current hot tickets: social networking.  Who is
whose friend, who is in whose network, blah blah blah.  (The junior high
students seem to grok the concept of trust being semi-transitive!)

Let's also draw upon operational lessons from a couple old-timers.  I
recall using a critter known as "NNTP".  And once upon a time, before my
days on the Internet, lived a funny little beast called "UUCP".

We track email quality from all mailservers that hit us.  I can whip up
a list of MXes/organizations that I'm willing to "trust" -- and let's
leave that term imprecisely-defined for now.

Here's what I propose:

Establish a "distrust protocol".  Let path weight be "distrust".  The
"trust path" is of secondary importance to "path weight", although not
completely irrelevant.  SMTP endpoint not in graph?  Fine; have some
default behavior.

Let _trust_ be semi-transitive, a la BGP -- a technology that we know,
understand, and at least sort of trust to run this crazy, giant network
that dwarfs even a 50M-user provider.

Let actual _content_ still be end-to-end, so that we do not simply
reincarnate NNTP or UUCP.

Alternatively:

I'm open to other suggestions.

Or, there's plan "C":

We continue to argue, banter, carp, fuss, grumble, moan, swear, whine,
et cetera (I decided against running the alphabet) over the problem.
Hey, it's worked/working great so far, right?


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.