Re: [SLUG] spam filters not working
At Fri, 7 May 2004 11:04:55 +1000, Mary Gardiner wrote: I train it on all my spam and non-spam, and I train it every week on mail received during that week. (With a cronjob, I just need to make sure false negatives and positives are moved into an appropriate folder.) I don't delete the existing token database ever. .. so with all that manual spam/ham classification/archiving, is there actually any point running an automatic spam filter anymore? From what I can see any spam filter that needs training is missing the point - but I've never actually run any of the Bayesian filters so its purely ignorant prejudice ;) -- - Gus -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
On Sat, May 08, 2004, Angus Lees wrote: .. so with all that manual spam/ham classification/archiving, is there actually any point running an automatic spam filter anymore? Well, depends on what you mean by all that. About three times a week, a mail ends up in the wrong folder. (That's an error rate of about 0.15%.) I move those three mails to the right folder so that they get learned correctly. Once a week a cronjob fires and learns whatever happens to be in my mail folders at the time. I'm happy with manually moving three mails a week. I spend more time 'training' procmail than I do training my Bayesian filter. (Please do not wave the magical procmail rule at me, because the Linguist List don't put the right headers in their mails and therefore it is not the solution to the problem I'm thinking of.) The time investment is considerably less than all that manual spam deleting, for example. From what I can see any spam filter that needs training is missing the point - but I've never actually run any of the Bayesian filters so its purely ignorant prejudice ;) Well, it depends on what the point is. If the point is it is easy to tell spam from non spam with rules that are already in existence then contribute your rules to the SpamAssassin project because many people are finding that their rules degrade in effectiveness over time. SA, untrained, would miss about 15% of the spam I currently receive. If the point is it should be possible to tell spam from non spam with rules with an acceptable error rate that will not degrade for a long period of time you're probably right, but my suspicion is that coming up with those rules is like a lot of natural language problems: hard. If the point is spam just doesn't annoy me that much, and I'd rather just delete the stuff than spend more than 1 minute setting up a filter then we're different. -Mary -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
This one time, at band camp, Angus Lees wrote: At Fri, 7 May 2004 11:04:55 +1000, Mary Gardiner wrote: I train it on all my spam and non-spam, and I train it every week on mail received during that week. (With a cronjob, I just need to make sure false negatives and positives are moved into an appropriate folder.) I don't delete the existing token database ever. .. so with all that manual spam/ham classification/archiving, is there actually any point running an automatic spam filter anymore? From what I can see any spam filter that needs training is missing the point - but I've never actually run any of the Bayesian filters so its purely ignorant prejudice ;) I occasionally hit S in mutt which trains bogofilter and saves the message to my spam corpus. the reply, list-reply, and group-reply commands are bound to train bogofilter that the message i'm replying to is not spam. So, I only half-manually train my bogofilter, and that's the only filter i'm using. I rarely see spam get past my filters nowadays, and I rarely see false positives in my spambox on the few occasions that I check it. The time spent training my bogomonster is much less time than it takes to open the debian-devel folder and mark it all as read. -- [EMAIL PROTECTED] http://spacepants.org/jaq.gpg -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
On Fri, May 07, 2004 at 11:52:30AM +1000, James Gregory wrote: On Thu, 2004-05-06 at 23:36 +1000, Nicholas Tomlin wrote: How can we get a spam filter to check for misspelt words and reject the mail on that basis? I thought about this a while ago. It would be relatively easy to implement -- just hook aspell into a procmail rule. I eventually came to I thought you were going to suggest using a spell checker to auto-correct some of the spelling, and then filter for spam ... -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
On Fri, May 07, 2004, James Gregory wrote: 2. For any significant misspellings of words, bogofilter will already look for them. The Language Log says there are 1,300,925,111,156,286,160,896 possible creative mispellings of Viagra alone. http://itre.cis.upenn.edu/~myl/languagelog/archives/000773.html Even given this, I think Bayesian filtering is still worth its while. In one of Paul Graham's articles on it, he points out that many of the highly indicative words it finds are not the glaringly obvious ones that a human would guess as spammy. -Mary -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
[SLUG] spam filters not working
Hell sluggers, I´ve noticed the amount of mail bypassing the filters seems to be increasing and would like to venture an idea... Most of the mails that get through are misspelt to put the filter off the track. eg, Some that got through: { Ur Diicky Is So Smaall horsemeat digenesis Darlin how good to see you! :) } How can we get a spam filter to check for misspelt words and reject the mail on that basis? Your assistance is appreciated. Thank you, Nicholas Tomlin. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
Since changing to bogofilter (plus stringent training), my spam filtering is near on perfect. I've had one only slip through this week. I've written a crude script to make training easier. You didn't tell us which spam filter, but spamassassin was catching less than 50% spams when I stopped using it. On Thu, 6 May 2004, Nicholas Tomlin wrote: Hell sluggers, I´ve noticed the amount of mail bypassing the filters seems to be increasing and would like to venture an idea... Most of the mails that get through are misspelt to put the filter off the track. eg, Some that got through: { Ur Diicky Is So Smaall horsemeat digenesis Darlin how good to see you! :) } How can we get a spam filter to check for misspelt words and reject the mail on that basis? Your assistance is appreciated. Thank you, Nicholas Tomlin. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
David wrote: ...snip.. You didn't tell us which spam filter, but spamassassin was catching less than 50% spams when I stopped using it. Is it worthwhile to retrain your spam filters when the nature of spam changes? Just wondering. -- Terry Collins {:-)}}} email: terryc at woa.com.au www: http://www.woa.com.au Wombat Outdoor Adventures Bicycles, Computers, GIS, Printing, Publishing People without trees are like fish without clean water -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
On Fri, May 07, 2004 at 09:43:06AM +1000, Terry Collins wrote: Is it worthwhile to retrain your spam filters when the nature of spam changes? Yes. A month or so ago spamassassin's success rate suddenly plummeted. After retraining, it's back to normal -- missing only one or two or the thousand or so spams I get each week. Cheers, John -- I'm collecting all those wiry hairs that you find in between keyboard keys, and saving them up until I have enough for a beard of my own. -- Kirrily 'Skud' Robert -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
On Fri, May 07, 2004, David wrote: You didn't tell us which spam filter, but spamassassin was catching less than 50% spams when I stopped using it. SpamAssassin has Bayesian filtering too these days. People who are already using it should probably try its sa-learn utility before jumping spam filters. For what it's worth, I use SpamAssassin with the learner, and it catches about 99% of all spam sent to me (this does mean a false negative about 3 times a week though), and hasn't caught a non-spam in months -- I do use whitelists for some organisations though. I train it on all my spam and non-spam, and I train it every week on mail received during that week. (With a cronjob, I just need to make sure false negatives and positives are moved into an appropriate folder.) I don't delete the existing token database ever. -Mary -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
On Thu, 2004-05-06 at 23:36 +1000, Nicholas Tomlin wrote: Hell sluggers, I´ve noticed the amount of mail bypassing the filters seems to be increasing and would like to venture an idea... Most of the mails that get through are misspelt to put the filter off the track. eg, Some that got through: { Ur Diicky Is So Smaall horsemeat digenesis Darlin how good to see you! :) } How can we get a spam filter to check for misspelt words and reject the mail on that basis? I thought about this a while ago. It would be relatively easy to implement -- just hook aspell into a procmail rule. I eventually came to two conclusions: 1. I would lose a lot of important (though difficult to read) mail this way. 2. For any significant misspellings of words, bogofilter will already look for them. But I've not actually done the experiment. You should try it out and see if it actually helps. It'd be a great way of teaching your friends and family to spell :) If you want to give it a go, I'm sure someone on the list could help you with procmail magic James. -- James Gregory [EMAIL PROTECTED] -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
quote who=James Gregory How can we get a spam filter to check for misspelt words and reject the mail on that basis? I thought about this a while ago. It would be relatively easy to implement -- just hook aspell into a procmail rule. I eventually came to two conclusions: 1. I would lose a lot of important (though difficult to read) mail this way. 2. For any significant misspellings of words, bogofilter will already look for them. Mind you, having client-side scoring down of people who can't spell would be a fantastic feature for spelling-fascists the world over. Tell you what: I'D LIKE THIS FEATURE SO MUCH, I'D BUY THE COMPANY! - Jeff -- GVADEC 2004: Kristiansand, Norwayhttp://2004.guadec.org/ He's not an idiot. The doctor said so. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jeff Waugh wrote: | quote who=James Gregory | |How can we get a spam filter to check for misspelt words and reject the |mail on that basis? | |I thought about this a while ago. It would be relatively easy to implement |-- just hook aspell into a procmail rule. I eventually came to two |conclusions: | |1. I would lose a lot of important (though difficult to read) mail this |way. |2. For any significant misspellings of words, bogofilter will already look |for them. | | | Mind you, having client-side scoring down of people who can't spell would be | a fantastic feature for spelling-fascists the world over. Tell you what: I'D | LIKE THIS FEATURE SO MUCH, I'D BUY THE COMPANY! english is a ridiculous language anyway. but restricting spam based on english spelling would be terrible for those of use who can speak more than one language. more so for those of us whos second (third and forth) languages arent a european or asian or middle eastern language. Dean -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAmvA9I1HDX08lY+ARAkohAJ9Gx6I6oTlXTvn+FQYPbZ4rxI78pgCcDSx5 L2bV/r25O+6xH11B1o0IuN4= =9V/e -END PGP SIGNATURE- -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] spam filters not working
quote who=Dean Hamstead but restricting spam based on english spelling would be terrible There are lots of spelling modules and dictionaries out there - who said anything about English? Spelling fascists come in EVERY LANGUAGE (that you can write, at least). - Jeff -- GVADEC 2004: Kristiansand, Norwayhttp://2004.guadec.org/ In the pre-Internet age, I was like an Internet kid, with a 3D search engine, trying to find weird stuff. - John Safran -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html