Re: Fighting Spam with Python

2005-08-26 Thread David MacQuigg
On Thu, 25 Aug 2005 13:22:53 -0400, François Pinard wrote:
[David MacQuigg]

 The key new features needed in a spam filter are the ability to
 extract the sender's identity (not that of the latest forwarder), and
 to factor into the spam score the reputation of that identity.

This will only work if your system is immune to forgeries, while being
largely widespread.

Stopping forgery is what the new authentication methods are all about.
Getting these methods widely and effectively used is our big
challenge, and one that I hope to accomplish with my efforts.  There
are a bunch of pieces that need to work together more smoothly.
That's where Python comes in.  There are some challenging constraints,
like the system has to work without government regulation.  I've got a
first draft of a website for open-mail.org - temporarily at
http://purl.net/macquigg/email/registry  Suggestions are welcome.

 In the flow we envision, the spam filter is the final process, used
 only on the 5% that is hard to classify.  80% will get an immediate
 reject.  15% will get an immediate accept without filtering, because
 the sender is authenticated and has a good reputation.  Eventually,
 all reputable senders will join the 15%, and the 5% will shrink to
 where we can ignore it.

It's fun to read statistics about a vision! :-)

The 80% is real. http://messagelabs.com/emailthreats  As to how the
remaining 20% will split, that's a guess, but one that I think is
realistic.  See http://www.spamhaus.org/effective_filtering.html for
comparable numbers using only IP blacklists and spam filtering.

The 5% still needing filtering will be those senders that don't offer
any authentication or that authenticate with an identity that has not
yet acquired a reputation.

 You might find www.spambayes.org of interest, in several ways.

Spambayes is surprisingly good as it already stands.

I haven't used Spambayes, but my experience with Spamnix (an offshoot
of Spam Assassin) is that statistical filters always have a few false
rejects.  In my case, that's about two per week.

The solution to this problem is a reliable system allowing receivers
to determine the identity and reputation of an unknown sender.  Then
we can safely ignore the spam.

-- Dave

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-26 Thread François Pinard
[David MacQuigg]

 Getting these methods widely and effectively used is our big
 challenge, and one that I hope to accomplish with my efforts.

I wish one of these methods, either yours or one of these few others
which were developed and proposed in the recent years, will succeed.  It
might be useful, for someone involved like you are (thanks for all of
us!), that you make a survey of those others, trying to understand why
they failed to acquire popularity, not repeating the same errors if any.

-- 
François Pinard   http://pinard.progiciels-bpi.ca
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-26 Thread Christos Georgiou
On Wed, 24 Aug 2005 22:46:28 -0700, rumours say that David MacQuigg dmq
at pobox.com might have written:

I'm writing some scripts to check incoming mail against a registry of
reputable senders, using the new authentication methods.  Python is
ideal for this because it will give mail-system admins the ability to
experiment with the different methods, and provide some real-world
feedback sorely needed by the advocates of each method.  So far, we
have SPF and CSV.  See http://purl.net/macquigg/email/python for the
latest project status.

I am on the side of advocating SPF records --and I am one of the first
four postmasters in my country's TLD that set up SPF records for two of
the email domains I'm administrating.  SPF is an internet draft now.[1]

Your method is/will_not be free (as in beer), as hinted in
http://www.ece.arizona.edu/~edatools/home/email/registry/Form-Sender01.htm
.  *That* is a drawback similar to the licensing of the Microsoft's
Sender/Caller-ID scheme.  Why not support open, free standards?

I have developped scripts of my own to perform various consistency
checks (including SPF lookup) and maintain my own black list (I am
consulting three RBL's which I have found to be close to my standards,
but I want to avoid excessive usage of their bandwidth), and although it
takes some time almost every day overseeing things, I would be very
timid to support such a free (as in jazz :) scheme.  I mean, the
reputation idea is nice, but paying for this reputation won't help its
spreading.

Good luck with it as a business, though.


[1]
http://www.ietf.org/internet-drafts/draft-schlitt-spf-classic-02.txt
http://www.ietf.org/internet-drafts/draft-newton-maawg-spf-considerations-00.txt
-- 
TZOTZIOY, I speak England very best.
Dear Paul,
please stop spamming us.
The Corinthians
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-26 Thread John J. Lee
David MacQuigg dmq at pobox.com writes:
[...]
 I haven't used Spambayes, but my experience with Spamnix (an offshoot
 of Spam Assassin) is that statistical filters always have a few false
 rejects.  In my case, that's about two per week.
[...]

That is precisely the problem that Bayesian filtering was designed to
solve.

AFAIK, Spam Assassin is a non-Bayesian filter.  (Though I think I
heard they were thinking of grafting on Bayesian filtering to their
existing algorithms, I'm not sure if they did it, or even if that's
actually a sane thing to do.)

[David, in an earlier email]
 reject.  15% will get an immediate accept without filtering, because
 the sender is authenticated and has a good reputation.  Eventually,
 all reputable senders will join the 15%, and the 5% will shrink to
 where we can ignore it.

Two questions you seem to be implicitly assuming particular answers
to: Is widespread authentication a good thing?  Does it solve any
problem not solved by Bayesian filtering plus good mail client
support?  My first reaction is to answer no to both questions, so to
regard your effort as harmful.  Might be interesting to hear why you
think it's a good thing, though.


John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-26 Thread David MacQuigg
On Fri, 26 Aug 2005 10:36:28 -0400, François Pinard
[EMAIL PROTECTED] wrote:

[David MacQuigg]

 Getting these methods widely and effectively used is our big
 challenge, and one that I hope to accomplish with my efforts.

I wish one of these methods, either yours or one of these few others
which were developed and proposed in the recent years, will succeed.

I don't have a method, and that is a key part of the strategy.  The
Registry is intended to support all methods.  My main technical
contribution, if you can call it that, is to figure out how we can tie
these methods into a system where not all participants are using the
same method.  ( An inter-operability protocol, if you need a fancy
name.)

It might be useful, for someone involved like you are (thanks for all of
us!), that you make a survey of those others, trying to understand why
they failed to acquire popularity, not repeating the same errors if any.

The main reason for the current failure is that the effort to achieve
a common authentication standard has degenerated into a war.

I did try to find information on other attempts at setting up a
Registry/Clearinghouse of reputation information.  There has been an
effort by Spamhaus to establish such a registry, but they were
counting on senders to support it.  That seems to me a fatal flaw.

Our plans are to have *receivers* support the registry via
subscription fees.  Senders will need an incentive, and that will be
provided by receivers who use the Registry to clear reputable mail,
and send the rest to a spam filter.

There are also some successful proprietary systems, like IronPort
Senderbase, that I think are similar, but I don't know the details.
You have to pay them big bucks for a spam appliance.

--
Dave

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-25 Thread Larry Bates
Before you do too much work you should probably check out:

http://spambayes.sourceforge.net/

There has already been a lot of work done on this project.

FYI, Larry

David MacQuigg wrote:
 Are you as mad about spam as I am?  Are you frustrated with the
 pessimism and lack of progress these last two years?  Do you have
 faith that an open-source project can do better than the big companies
 competing for a lock-in solution?  If so, you might be interested in
 the Open-Mail project.
 
 I'm writing some scripts to check incoming mail against a registry of
 reputable senders, using the new authentication methods.  Python is
 ideal for this because it will give mail-system admins the ability to
 experiment with the different methods, and provide some real-world
 feedback sorely needed by the advocates of each method.  So far, we
 have SPF and CSV.  See http://purl.net/macquigg/email/python for the
 latest project status.
 
 I welcome anyone who is interested in helping, expecially if you have
 some experience with mail transfer programs, like Sendmail or Postfix,
 or spam filtering programs, like SpamAssassin.  My Python may not be
 the best, so I welcome suggestions there also.  We need to make these
 scripts a model of clarity.
 
 --
 Dave
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-25 Thread Peter Hansen
David MacQuigg wrote:
 Are you as mad about spam as I am?  Are you frustrated with the
 pessimism and lack of progress these last two years?  Do you have
 faith that an open-source project can do better than the big companies
 competing for a lock-in solution?  If so, you might be interested in
 the Open-Mail project.
 
 I'm writing some scripts to check incoming mail against a registry of
 reputable senders, using the new authentication methods.  Python is
 ideal for this because it will give mail-system admins the ability to
 experiment with the different methods, and provide some real-world
 feedback sorely needed by the advocates of each method.  So far, we
 have SPF and CSV.  See http://purl.net/macquigg/email/python for the
 latest project status.

You might find www.spambayes.org of interest, in several ways.

-Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-25 Thread David MacQuigg
On Thu, 25 Aug 2005 10:18:37 -0400, Peter Hansen [EMAIL PROTECTED]
wrote:

David MacQuigg wrote:
 Are you as mad about spam as I am?  Are you frustrated with the
 pessimism and lack of progress these last two years?  Do you have
 faith that an open-source project can do better than the big companies
 competing for a lock-in solution?  If so, you might be interested in
 the Open-Mail project.
 
 I'm writing some scripts to check incoming mail against a registry of
 reputable senders, using the new authentication methods.  Python is
 ideal for this because it will give mail-system admins the ability to
 experiment with the different methods, and provide some real-world
 feedback sorely needed by the advocates of each method.  So far, we
 have SPF and CSV.  See http://purl.net/macquigg/email/python for the
 latest project status.

You might find www.spambayes.org of interest, in several ways.

Integration of a good spam filter is one of our top priorities.
Spambayes looks like a good candidate.  The key new features needed in
a spam filter are the ability to extract the sender's identity (not
that of the latest forwarder), and to factor into the spam score the
reputation of that identity.  We could use some help on this
integration.

I guess I should have said a little more about the Open-Mail project.
We are not focused on developing new authentication or filtering
methods, but rather, providing a platform that will bring these pieces
together and allow the mail admin to chose which methods are used and
in what order.  Interoperability has been the main barrier to
widescale use of authentication.  Python is superb at gluing these
pieces together.

In the flow we envision, the spam filter is the final process, used
only on the 5% that is hard to classify.  80% will get an immediate
reject.  15% will get an immediate accept without filtering, because
the sender is authenticated and has a good reputation.  Eventually,
all reputable senders will join the 15%, and the 5% will shrink to
where we can ignore it.

--
Dave


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fighting Spam with Python

2005-08-25 Thread François Pinard
[David MacQuigg]

 The key new features needed in a spam filter are the ability to
 extract the sender's identity (not that of the latest forwarder), and
 to factor into the spam score the reputation of that identity.

This will only work if your system is immune to forgeries, while being
largely widespread.

 In the flow we envision, the spam filter is the final process, used
 only on the 5% that is hard to classify.  80% will get an immediate
 reject.  15% will get an immediate accept without filtering, because
 the sender is authenticated and has a good reputation.  Eventually,
 all reputable senders will join the 15%, and the 5% will shrink to
 where we can ignore it.

It's fun to read statistics about a vision! :-)

 You might find www.spambayes.org of interest, in several ways.

Spambayes is surprisingly good as it already stands.

-- 
François Pinard   http://pinard.progiciels-bpi.ca
-- 
http://mail.python.org/mailman/listinfo/python-list


Fighting Spam with Python

2005-08-24 Thread David MacQuigg
Are you as mad about spam as I am?  Are you frustrated with the
pessimism and lack of progress these last two years?  Do you have
faith that an open-source project can do better than the big companies
competing for a lock-in solution?  If so, you might be interested in
the Open-Mail project.

I'm writing some scripts to check incoming mail against a registry of
reputable senders, using the new authentication methods.  Python is
ideal for this because it will give mail-system admins the ability to
experiment with the different methods, and provide some real-world
feedback sorely needed by the advocates of each method.  So far, we
have SPF and CSV.  See http://purl.net/macquigg/email/python for the
latest project status.

I welcome anyone who is interested in helping, expecially if you have
some experience with mail transfer programs, like Sendmail or Postfix,
or spam filtering programs, like SpamAssassin.  My Python may not be
the best, so I welcome suggestions there also.  We need to make these
scripts a model of clarity.

--
Dave

-- 
http://mail.python.org/mailman/listinfo/python-list