Re[3]: Bayesian filtering products roundup: WAS Re[3]: Fwd: Re: Spam

2003-09-30 Thread Terry
On Monday, September 29, 2003 at 11:03 PM, David wrote:

 I would be happy to supply a lot of spam messages to folks who need
 some samples for training purposes!  ;-)

Thanks for the offer, but I get enough myself. :)

With all the talk about spam on the list lately, I thought I'd share
some information I came across. Using others' messages may not be the
best thing to do and in some instances, might decrease the accuracy of
the Bayesian filter, or so I was told on another mailing list. I was
referred to www.paulgraham.com/spam.html and from what I've gathered,
it seems to be the case. I've been reading Paul Graham's Plan for
Spam and the newer Better Bayesian Filtering. It makes for
interesting reading.

Bayesian filtering is based on the statistical probability of an
e-mail being spam as it relates to *your* e-mail and not anyone
else's. The probability that a particular word used in an e-mail will
identify it as spam may be different for me than for you. Take the
word click for example. Suppose I'm on a list where people are
always using the word click, so the word is present in both spam and
non-spam for me. You never receive any e-mail with the word click in
it unless it's spam. The probability score for spam for click will
be much higher for you than it would be for me.

You need both spam and ham to train a filter. It only follows that if
I use your e-mail to train my filter, I may generate false positives
because my e-mail will be different from yours. As a side-benefit to
this personalization, Graham explains that it makes it difficult for
spammers to fine-tune messages since what would be fine-tuning for me
wouldn't necessarily be fine-tuning for you. This then limits the
avenues that spammers have to alter their messages to get through the
filters. Spammers can do it for rules-based programs, such as
SpamAssassin (without Bayes) by looking at the rules and then coming
up with methods to counteract the rules. In that situation, it's a
game of tag. Rule -- Way around the rule -- New Rule -- Way around
the new rule -- ad infinitum.

Of course, I could be wrong about this as it is my interpretation of
Graham's writings. If I am, I hope someone will let me know.

-- 
Regards,
Terry

Using The Bat! v1.62r on Windows 2000 5.0 Build 2195 Service Pack 3



Current version is 2.00.6 | Using TBUDL information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[3]: Bayesian filtering products roundup: WAS Re[3]: Fwd: Re: Spam

2003-09-29 Thread Granville Cousins
Hello David,

Monday, September 29, 2003, 11:03:19 PM, you wrote:

DRA Hello, Keith:

DRA Monday, September 29, 2003, 5:33:27 PM, you wrote:

DRA (snip)

KRI Just switched from PopFile KR to SpamBayes (snip)
KR(http://spambayes.sourceforge.net/ ). It's been kind KR of (snip)
KRa pain to set up--for me, at least--but I finally got it KR
KRworking today and it has great possibilities.

I tried out a few Spam filters MailWasher, POPFile and finally settled
on K9. A little tricky to set filters in  TheBat! but once that was
sorted it does a superb job.

-- 
Love and Light,
 Granvillemailto:[EMAIL PROTECTED]



Current version is 2.00.6 | Using TBUDL information:
http://www.silverstones.com/thebat/TBUDLInfo.html