Re: Bayes training question

2007-02-16 Thread Steven Stern
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

yossim wrote:
> Hi folks, Hi can i learn miss identified junk mail that is store on
> exchange or at the otulook clients? Can i simply copy those mails to a
> folder on my Linux server and run sa-learn with the required parameters?
> Kindly regards, Yossi Mor

see http://sstern.ccim.com/2006/07/14/training-sitewide-spam-filters/

- --

  Steve
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFF1ay5eERILVgMyvARAk2JAJ4rXyGAcdzv14vcTreJmSpnNLP1LwCffXiS
zoIrJH2UIIUawBbshrVJ8Sc=
=4mR1
-END PGP SIGNATURE-


Re: bayes training question

2005-05-24 Thread jdow
He's on a machine I administer. I have a "thing" about autolearn.
I don't do it. So this is not a problem for Loren. (It's too big
a pain to repair a self-mis-trained bayes database. So the neat
selectively trained bayes databases we have work quite nicely as
a result.) I cannot see the long term utility of autolearn whereas
I can see its long term futility.

{^_-}

From: "mizzio" <[EMAIL PROTECTED]>

> Loren,
> 
> it works:
> 
> X-Spam-Status: No, hits=-56.2 required=4.5
> X-Spam-Report: SA TESTS -100 USER_IN_WHITELIST_SA   SA List 2.4 BAYES_50
> BODY: Bayesian spam probability is 40 to 60% [score: 0.4439]
> 
> 
> One more question: I understand that in this way the mail are never
> marked at spam, but they are autolearned by the system.
> Is this correct ? 
> 
> Thank you,
> Maurizio
> 
> 
> Il giorno mar, 24-05-2005 alle 05:21 -0700, Loren Wilton ha scritto:
> > header  WHITELIST_SA   List-Id =~
> > /(?:dev|users)\.spamassassin\.apache\.org/i
> > describe WHITELIST_SA   SA List
> > score  WHITELIST_SA   -100



Re: bayes training question

2005-05-24 Thread Roman Volf

Jim Maul wrote:




It fixes my problem of list messages being autolearned incorrectly, 
but i'd rather not scan them at all.  Someone made a suggestion (and 
patch) on the qmail scanner mailing list where you can optionally turn 
SA scanning off using tcp.smtp from certain ip's.  I may use this to 
not pass messages coming from the apache mail server through SA.  You 
may want to check that list out as well.


-Jim


I should have read your whole email before replying, but anyway here is 
the link to the patch I posted to the qmail-scanner list:


http://www.thevolf.com/qmail/qmail-scanner-skip-sa.patch


--
Roman Volf
Keystreams Internet Solutions
[EMAIL PROTECTED]



Re: bayes training question

2005-05-24 Thread mizzio
Very nice solution for my needs.

Thank you !
maurizio

Il giorno mar, 24-05-2005 alle 13:08 -0400, Jim Maul ha scritto:
> mizzio wrote:
> > Hello guys,
> > sorry to bother you again: I didn't find a way to exclude this mailing
> > list from SA scanning in my setup.
> > I'm using qmail + qmail-scanner + spamassassin on my mailserver, the
> > only posts I found are about excluding the scanning with procmail (which
> > I'm not using).
> > 
> > I did not find a way of doing this through qmail-scanner: is there a way
> > of doing this directly with spamassassin ?
> > 
> > Any idea is greatly appreciated.
> > 
> > Thank you and regards,
> > Maurizio
> > 
> >
> 
> Until i can come up with a way to not scan some emails selectively using 
> qmail-scanner (without procmail) i have setteled on using the following 
> statements in my local.cf
> 
> bayes_ignore_to users@spamassassin.apache.org
> whitelist_to users@spamassassin.apache.org
> 
> This causes (most) list messages to not be marked as spam and all list 
> messages are ignored by bayes.
> 
> It fixes my problem of list messages being autolearned incorrectly, but 
> i'd rather not scan them at all.  Someone made a suggestion (and patch) 
> on the qmail scanner mailing list where you can optionally turn SA 
> scanning off using tcp.smtp from certain ip's.  I may use this to not 
> pass messages coming from the apache mail server through SA.  You may 
> want to check that list out as well.
> 
> -Jim
> 



Re: bayes training question

2005-05-24 Thread Roman Volf


Until i can come up with a way to not scan some emails selectively 
using qmail-scanner (without procmail) i have setteled on using the 
following statements in my local.cf


bayes_ignore_to users@spamassassin.apache.org
whitelist_to users@spamassassin.apache.org

This causes (most) list messages to not be marked as spam and all list 
messages are ignored by bayes.


It fixes my problem of list messages being autolearned incorrectly, 
but i'd rather not scan them at all.  Someone made a suggestion (and 
patch) on the qmail scanner mailing list where you can optionally turn 
SA scanning off using tcp.smtp from certain ip's.  I may use this to 
not pass messages coming from the apache mail server through SA.  You 
may want to check that list out as well.


-Jim


Jim,

You can use a patch I wrote yesterday to qmail-scanner-queue.pl 
http://www.thevolf.com/qmail/qmail-scanner-skip-sa.patch.


Then add:

209.237.227.199:allow,IGNORE_SA="yes"

Do your tcp.smtp and rebuild the tcp.smtpd.cdb file.

This will cause qmail-scanner to skip SA tests for email originating 
from the spamassassin mailing list (hermes.apache.org).


--
Roman Volf
Keystreams Internet Solutions
[EMAIL PROTECTED]



Re: bayes training question

2005-05-24 Thread Jim Maul

mizzio wrote:

Hello guys,
sorry to bother you again: I didn't find a way to exclude this mailing
list from SA scanning in my setup.
I'm using qmail + qmail-scanner + spamassassin on my mailserver, the
only posts I found are about excluding the scanning with procmail (which
I'm not using).

I did not find a way of doing this through qmail-scanner: is there a way
of doing this directly with spamassassin ?

Any idea is greatly appreciated.

Thank you and regards,
Maurizio




Until i can come up with a way to not scan some emails selectively using 
qmail-scanner (without procmail) i have setteled on using the following 
statements in my local.cf


bayes_ignore_to users@spamassassin.apache.org
whitelist_to users@spamassassin.apache.org

This causes (most) list messages to not be marked as spam and all list 
messages are ignored by bayes.


It fixes my problem of list messages being autolearned incorrectly, but 
i'd rather not scan them at all.  Someone made a suggestion (and patch) 
on the qmail scanner mailing list where you can optionally turn SA 
scanning off using tcp.smtp from certain ip's.  I may use this to not 
pass messages coming from the apache mail server through SA.  You may 
want to check that list out as well.


-Jim


Re: bayes training question

2005-05-24 Thread mizzio
Loren,

it works:

X-Spam-Status: No, hits=-56.2 required=4.5
X-Spam-Report: SA TESTS -100 USER_IN_WHITELIST_SA   SA List 2.4 BAYES_50
BODY: Bayesian spam probability is 40 to 60% [score: 0.4439]


One more question: I understand that in this way the mail are never
marked at spam, but they are autolearned by the system.
Is this correct ? 

Thank you,
Maurizio


Il giorno mar, 24-05-2005 alle 05:21 -0700, Loren Wilton ha scritto:
> header  WHITELIST_SA   List-Id =~
> /(?:dev|users)\.spamassassin\.apache\.org/i
> describe WHITELIST_SA   SA List
> score  WHITELIST_SA   -100



Re: bayes training question

2005-05-24 Thread Loren Wilton
> I did not find a way of doing this through qmail-scanner: is there a way
> of doing this directly with spamassassin ?

Possibly someone else knows of a way with qmail-scanner.  If not, you can't
"exclude" it with SA, but you *can* whitelist the list with SA.  That will
probably be sufficient.

This will do the trick for you:

header  WHITELIST_SA   List-Id =~
/(?:dev|users)\.spamassassin\.apache\.org/i
describe WHITELIST_SA   SA List
score  WHITELIST_SA   -100

Someone will doubtless point out that this test is forgable, and potentially
will let real spam into your system.  I haven't had it happen yet.  But the
possibility exists.

Loren



Re: bayes training question

2005-05-24 Thread mizzio
Hello guys,
sorry to bother you again: I didn't find a way to exclude this mailing
list from SA scanning in my setup.
I'm using qmail + qmail-scanner + spamassassin on my mailserver, the
only posts I found are about excluding the scanning with procmail (which
I'm not using).

I did not find a way of doing this through qmail-scanner: is there a way
of doing this directly with spamassassin ?

Any idea is greatly appreciated.

Thank you and regards,
Maurizio

> The best thing is to avoid having the mail from this list go through SA.
> There are various ways to do this, depending on your mail setup.





Re: bayes training question

2005-05-23 Thread mizzio
Thank very much Loren.

regards,
mizzio

Il giorno lun, 23-05-2005 alle 04:51 -0700, Loren Wilton ha scritto:
> > - I get some messages marked as SPAM coming form this mailing list,
> > since the body contains URLs and text from real spam messages: do I have
> > to feed them in my DB as ham or this can cause some kind of bayes
> > poisoning ?
> 
> The best thing is to avoid having the mail from this list go through SA.
> There are various ways to do this, depending on your mail setup.
> 
> 
> > - I assume that the training is more important for the messages marked
> > with BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score:
> > 0.5998]; is this correct ?
> 
> Probably most important are cases where Bayes guessed wrong, rather than
> simply not being real sure.  Always train as ham or spam anything you see
> that Bayes decided to lean the other way.  This way it will get to know what
> is what for you.
> 
> Second most important would be training stuff that scores close to 50%.
> Personally I tend to dump most spam that scores less than about 80% into the
> spam training bucket.  Now and then I'll throw a handful of known ham in the
> ham bucket, to try to keep the number of learned ham/spam somewhat balaced.
> 
> 
> > - Shall I train as ham also the messages not marked as SPAM but having a
> > score close between 1/2 and 3/4 ? I mean, feeding also "normal" messages
> > into the system helps to have a good bayes filtering ?
> 
> I'm not absolutely sure what you are saying here.  If you are asking if you
> should train known ham as ham, the answer is yes.  Bayes needs to be able to
> decide which tokens are ham and which are spam.  It can only do this if it
> sees both ham and spam.  If you have ham that is hitting more than 20 or 30%
> you should certainly train it as ham.  However, even throwing ham that
> scores near 0 into training every so often is a good idea.
> 
> Loren
> 
> 



Re: bayes training question

2005-05-23 Thread Loren Wilton
> - I get some messages marked as SPAM coming form this mailing list,
> since the body contains URLs and text from real spam messages: do I have
> to feed them in my DB as ham or this can cause some kind of bayes
> poisoning ?

The best thing is to avoid having the mail from this list go through SA.
There are various ways to do this, depending on your mail setup.


> - I assume that the training is more important for the messages marked
> with BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score:
> 0.5998]; is this correct ?

Probably most important are cases where Bayes guessed wrong, rather than
simply not being real sure.  Always train as ham or spam anything you see
that Bayes decided to lean the other way.  This way it will get to know what
is what for you.

Second most important would be training stuff that scores close to 50%.
Personally I tend to dump most spam that scores less than about 80% into the
spam training bucket.  Now and then I'll throw a handful of known ham in the
ham bucket, to try to keep the number of learned ham/spam somewhat balaced.


> - Shall I train as ham also the messages not marked as SPAM but having a
> score close between 1/2 and 3/4 ? I mean, feeding also "normal" messages
> into the system helps to have a good bayes filtering ?

I'm not absolutely sure what you are saying here.  If you are asking if you
should train known ham as ham, the answer is yes.  Bayes needs to be able to
decide which tokens are ham and which are spam.  It can only do this if it
sees both ham and spam.  If you have ham that is hitting more than 20 or 30%
you should certainly train it as ham.  However, even throwing ham that
scores near 0 into training every so often is a good idea.

Loren