Re: [SLUG] spam filters not working

2004-05-07 Thread Angus Lees
At Fri, 7 May 2004 11:04:55 +1000, Mary Gardiner wrote:
 I train it on all my spam and non-spam, and I train it every week on
 mail received during that week. (With a cronjob, I just need to make
 sure false negatives and positives are moved into an appropriate
 folder.) I don't delete the existing token database ever.

.. so with all that manual spam/ham classification/archiving, is there
actually any point running an automatic spam filter anymore?

From what I can see any spam filter that needs training is missing the
point - but I've never actually run any of the Bayesian filters so its
purely ignorant prejudice ;)

-- 
 - Gus
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-07 Thread Mary Gardiner
On Sat, May 08, 2004, Angus Lees wrote:
 .. so with all that manual spam/ham classification/archiving, is there
 actually any point running an automatic spam filter anymore?

Well, depends on what you mean by all that. About three times a week,
a mail ends up in the wrong folder. (That's an error rate of about
0.15%.) I move those three mails to the right folder so that they get
learned correctly. Once a week a cronjob fires and learns whatever
happens to be in my mail folders at the time. I'm happy with manually
moving three mails a week.

I spend more time 'training' procmail than I do training my Bayesian
filter. (Please do not wave the magical procmail rule at me, because the
Linguist List don't put the right headers in their mails and therefore
it is not the solution to the problem I'm thinking of.)

The time investment is considerably less than all that manual spam
deleting, for example.

 From what I can see any spam filter that needs training is missing the
 point - but I've never actually run any of the Bayesian filters so its
 purely ignorant prejudice ;)

Well, it depends on what the point is.

If the point is it is easy to tell spam from non spam with rules that
are already in existence then contribute your rules to the SpamAssassin
project because many people are finding that their rules degrade in
effectiveness over time. SA, untrained, would miss about 15% of the spam
I currently receive.

If the point is it should be possible to tell spam from non spam with
rules with an acceptable error rate that will not degrade for a long
period of time you're probably right, but my suspicion is that coming
up with those rules is like a lot of natural language problems: hard.

If the point is spam just doesn't annoy me that much, and I'd rather
just delete the stuff than spend more than 1 minute setting up a filter
then we're different.

-Mary
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-07 Thread Jamie Wilkinson
This one time, at band camp, Angus Lees wrote:
At Fri, 7 May 2004 11:04:55 +1000, Mary Gardiner wrote:
 I train it on all my spam and non-spam, and I train it every week on
 mail received during that week. (With a cronjob, I just need to make
 sure false negatives and positives are moved into an appropriate
 folder.) I don't delete the existing token database ever.

.. so with all that manual spam/ham classification/archiving, is there
actually any point running an automatic spam filter anymore?

From what I can see any spam filter that needs training is missing the
point - but I've never actually run any of the Bayesian filters so its
purely ignorant prejudice ;)

I occasionally hit S in mutt which trains bogofilter and saves the
message to my spam corpus.  the reply, list-reply, and group-reply
commands are bound to train bogofilter that the message i'm replying to
is not spam.  So, I only half-manually train my bogofilter, and that's
the only filter i'm using.

I rarely see spam get past my filters nowadays, and I rarely see false
positives in my spambox on the few occasions that I check it.

The time spent training my bogomonster is much less time than it takes
to open the debian-devel folder and mark it all as read.

-- 
[EMAIL PROTECTED]   http://spacepants.org/jaq.gpg
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Patrick Lesslie
On Fri, May 07, 2004 at 11:52:30AM +1000, James Gregory wrote:
 On Thu, 2004-05-06 at 23:36 +1000, Nicholas Tomlin wrote:
  How can we get a spam filter to check for misspelt words and reject the mail
  on that basis?
 
 I thought about this a while ago. It would be relatively easy to
 implement -- just hook aspell into a procmail rule. I eventually came to

I thought you were going to suggest using a spell checker to 
auto-correct some of the spelling, and then filter for spam ...
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Mary Gardiner
On Fri, May 07, 2004, James Gregory wrote:
 2. For any significant misspellings of words, bogofilter will already
 look for them.

The Language Log says there are 1,300,925,111,156,286,160,896 possible
creative mispellings of Viagra alone.

http://itre.cis.upenn.edu/~myl/languagelog/archives/000773.html

Even given this, I think Bayesian filtering is still worth its while.
In one of Paul Graham's articles on it, he points out that many of the
highly indicative words it finds are not the glaringly obvious ones that
a human would guess as spammy.

-Mary
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] spam filters not working

2004-05-06 Thread Nicholas Tomlin
Hell sluggers,

I´ve noticed the amount of mail bypassing the filters seems to be increasing 
and would like to venture an idea...

Most of the mails that get through are misspelt to put the filter off the 
track.

eg,

Some that got through:
{
Ur Diicky Is So Smaall horsemeat digenesis
Darlin how good to see you! :)
}
How can we get a spam filter to check for misspelt words and reject the mail 
on that basis?

Your assistance is appreciated.

Thank you,

Nicholas Tomlin.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread David

Since changing to bogofilter (plus stringent training), my spam filtering
is near on perfect. I've had one only slip through this week. I've written
a crude script to make training easier.

You didn't tell us which spam filter, but spamassassin was catching less
than 50% spams when I stopped using it.

On Thu, 6 May 2004, Nicholas Tomlin wrote:

 Hell sluggers,

 I´ve noticed the amount of mail bypassing the filters seems to be increasing
 and would like to venture an idea...

 Most of the mails that get through are misspelt to put the filter off the
 track.

 eg,

 Some that got through:
 {
 Ur Diicky Is So Smaall horsemeat digenesis
 Darlin how good to see you! :)
 }
 How can we get a spam filter to check for misspelt words and reject the mail
 on that basis?

 Your assistance is appreciated.

 Thank you,

 Nicholas Tomlin.

 --
 SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
 Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Terry Collins
David wrote:

...snip..

 You didn't tell us which spam filter, but spamassassin was catching less
 than 50% spams when I stopped using it.

Is it worthwhile to retrain your spam filters when the nature of spam
changes?
Just wondering.

-- 
   Terry Collins {:-)}}} email: terryc at woa.com.au  www:
http://www.woa.com.au  
   Wombat Outdoor Adventures Bicycles, Computers, GIS, Printing,
Publishing

 People without trees are like fish without clean water
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread John Clarke
On Fri, May 07, 2004 at 09:43:06AM +1000, Terry Collins wrote:

 Is it worthwhile to retrain your spam filters when the nature of spam
 changes?

Yes.  A month or so ago spamassassin's success rate suddenly
plummeted.  After retraining, it's back to normal -- missing only one
or two or the thousand or so spams I get each week.


Cheers,

John
-- 
I'm collecting all those wiry hairs that you find in between keyboard
keys, and saving them up until I have enough for a beard of my own.
-- Kirrily 'Skud' Robert
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Mary Gardiner
On Fri, May 07, 2004, David wrote:
 You didn't tell us which spam filter, but spamassassin was catching less
 than 50% spams when I stopped using it.

SpamAssassin has Bayesian filtering too these days. People who are
already using it should probably try its sa-learn utility before
jumping spam filters.

For what it's worth, I use SpamAssassin with the learner, and it catches
about 99% of all spam sent to me (this does mean a false negative about
3 times a week though), and hasn't caught a non-spam in months -- I do
use whitelists for some organisations though.

I train it on all my spam and non-spam, and I train it every week on
mail received during that week. (With a cronjob, I just need to make
sure false negatives and positives are moved into an appropriate
folder.) I don't delete the existing token database ever.

-Mary
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread James Gregory
On Thu, 2004-05-06 at 23:36 +1000, Nicholas Tomlin wrote:
 Hell sluggers,
 
 I´ve noticed the amount of mail bypassing the filters seems to be increasing 
 and would like to venture an idea...
 
 Most of the mails that get through are misspelt to put the filter off the 
 track.
 
 eg,
 
 Some that got through:
 {
 Ur Diicky Is So Smaall horsemeat digenesis
 Darlin how good to see you! :)
 }
 How can we get a spam filter to check for misspelt words and reject the mail 
 on that basis?

I thought about this a while ago. It would be relatively easy to
implement -- just hook aspell into a procmail rule. I eventually came to
two conclusions:

1. I would lose a lot of important (though difficult to read) mail this
way.
2. For any significant misspellings of words, bogofilter will already
look for them.

But I've not actually done the experiment. You should try it out and see
if it actually helps. It'd be a great way of teaching your friends and
family to spell :)

If you want to give it a go, I'm sure someone on the list could help you
with procmail magic

James.
-- 
James Gregory [EMAIL PROTECTED]

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Jeff Waugh
quote who=James Gregory

  How can we get a spam filter to check for misspelt words and reject the
  mail on that basis?
 
 I thought about this a while ago. It would be relatively easy to implement
 -- just hook aspell into a procmail rule. I eventually came to two
 conclusions:
 
 1. I would lose a lot of important (though difficult to read) mail this
 way.
 2. For any significant misspellings of words, bogofilter will already look
 for them.

Mind you, having client-side scoring down of people who can't spell would be
a fantastic feature for spelling-fascists the world over. Tell you what: I'D
LIKE THIS FEATURE SO MUCH, I'D BUY THE COMPANY!

- Jeff

-- 
GVADEC 2004: Kristiansand, Norwayhttp://2004.guadec.org/
 
 He's not an idiot.
The doctor said so.
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Dean Hamstead
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jeff Waugh wrote:
| quote who=James Gregory
|
|How can we get a spam filter to check for misspelt words and reject the
|mail on that basis?
|
|I thought about this a while ago. It would be relatively easy to implement
|-- just hook aspell into a procmail rule. I eventually came to two
|conclusions:
|
|1. I would lose a lot of important (though difficult to read) mail this
|way.
|2. For any significant misspellings of words, bogofilter will already look
|for them.
|
|
| Mind you, having client-side scoring down of people who can't spell
would be
| a fantastic feature for spelling-fascists the world over. Tell you
what: I'D
| LIKE THIS FEATURE SO MUCH, I'D BUY THE COMPANY!
english is a ridiculous language anyway.
but restricting spam based on english spelling would be terrible
for those of use who can speak more than one language. more so
for those of us whos second (third and forth) languages arent
a european or asian or middle eastern language.
Dean
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFAmvA9I1HDX08lY+ARAkohAJ9Gx6I6oTlXTvn+FQYPbZ4rxI78pgCcDSx5
L2bV/r25O+6xH11B1o0IuN4=
=9V/e
-END PGP SIGNATURE-
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] spam filters not working

2004-05-06 Thread Jeff Waugh
quote who=Dean Hamstead

 but restricting spam based on english spelling would be terrible

There are lots of spelling modules and dictionaries out there - who said
anything about English? Spelling fascists come in EVERY LANGUAGE (that you
can write, at least).

- Jeff

-- 
GVADEC 2004: Kristiansand, Norwayhttp://2004.guadec.org/
 
   In the pre-Internet age, I was like an Internet kid, with a 3D search
 engine, trying to find weird stuff. - John Safran
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html