Re: Bayes FP/FN Training Procedures

2005-01-07 Thread Pierre-Yves Bonnetain
Hi Jeff,
Jeff Koch wrote:
Has anyone come up with a script or method that would allow users to 
forward their false positive and false negative emails back to an 
address on the mailserver where they can be used to train the Bayes 
database. I understand that Bayes needs the email in its original format 
so the script has to strip off the forwarding enclosure.
On our imap server, each user may create/use two specific mailfolders, 
named Bayes and SpamErrors (name are _not_ important). The first one 
is for false negatives, the other for false positive. A script runs 
daily on the server and feeds those folders' contents to sa-learn. All 
the user has to do is move/copy his false (positives|negatives) to the 
proper folder.

Hth,
--
Pierre-Yves Bonnetain
BA Consultants - Sécurité informatique - www.ba-cst.com
Tel. : +33 (0) 563 277 241 - Fax : +33 (0) 563 277 245


Bayes FP/FN Training Procedures

2005-01-06 Thread Jeff Koch
Has anyone come up with a script or method that would allow users to 
forward their false positive and false negative emails back to an address 
on the mailserver where they can be used to train the Bayes database. I 
understand that Bayes needs the email in its original format so the script 
has to strip off the forwarding enclosure.

Thanks in advance.

Jeff Koch 




RE: Bayes FP/FN Training Procedures

2005-01-06 Thread Jason Gauthier
Neat! I was just thinking about how to do that myself.
But, I use exchange, so I'm not sure how to do it yet.
 

 -Original Message-
 From: Jeff Koch [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 06, 2005 8:42 AM
 To: users@spamassassin.apache.org
 Subject: Bayes FP/FN Training Procedures
 
 
 Has anyone come up with a script or method that would allow 
 users to forward their false positive and false negative 
 emails back to an address on the mailserver where they can be 
 used to train the Bayes database. I understand that Bayes 
 needs the email in its original format so the script has to 
 strip off the forwarding enclosure.
 
 Thanks in advance.
 
 
 
 
 Jeff Koch 
 
 
 


RE: Bayes FP/FN Training Procedures

2005-01-06 Thread Pierre Thomson
I have a script that I use with Exchange/Outlook for Bayes training, but it's 
not simple.  You can't just forward a message back to the SA box, since Outlook 
deletes most of the original headers.  You have to cut-n-paste the whole 
email into a new email and send THAT to the SA box.  There the script 
unencapsulates the email and feeds it to sa-learn.

I don't consider my script to be distributable, and it's only part of a larger 
scheme involving DNS, sendmail redirection, and other variables, but at least 
this might give you some ideas to play with.

Pierre Thomson
BIC


-Original Message-
From: Jason Gauthier [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 06, 2005 8:44 AM
To: Jeff Koch; users@spamassassin.apache.org
Subject: RE: Bayes FP/FN Training Procedures


Neat! I was just thinking about how to do that myself.
But, I use exchange, so I'm not sure how to do it yet.
 

 -Original Message-
 From: Jeff Koch [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 06, 2005 8:42 AM
 To: users@spamassassin.apache.org
 Subject: Bayes FP/FN Training Procedures
 
 
 Has anyone come up with a script or method that would allow 
 users to forward their false positive and false negative 
 emails back to an address on the mailserver where they can be 
 used to train the Bayes database. I understand that Bayes 
 needs the email in its original format so the script has to 
 strip off the forwarding enclosure.
 
 Thanks in advance.
 
 
 
 
 Jeff Koch 
 
 
 


Re: Bayes FP/FN Training Procedures

2005-01-06 Thread Carinus Carelse
If you have an imap server.  what I have done is that I have setup two publice
folders and then I use a script that I found on the
internet to read and rebuild the bayes.  The users copy the spam message in a
SPAM folder and the ham into a NOT SPAM folder this
keeps the message in tact.  I subscribe them to the folder and then let the
script run once a
day.  I am sure you could do this with exchange's public folders and then use
the IMAP server port to teach bayes.

Carinus



RE: Bayes FP/FN Training Procedures

2005-01-06 Thread Kang, Joseph S.
 -Original Message-
 From: Carinus Carelse [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 06, 2005 8:27 AM
 To: users@spamassassin.apache.org
 Subject: Re: Bayes FP/FN Training Procedures
 
 
 If you have an imap server.  what I have done is that I have 
 setup two publice folders and then I use a script that I 
 found on the internet to read and rebuild the bayes.  The 
 users copy the spam message in a SPAM folder and the ham into 
 a NOT SPAM folder this keeps the message in tact.  I 
 subscribe them to the folder and then let the script run once 
 a day.  I am sure you could do this with exchange's public 
 folders and then use the IMAP server port to teach bayes.
 

This is what I have done as well.  It's much easier this way.  

The script I'm using was found through a search of the SA list archives on
GMANE.  

Best of luck.

-Joe K.


Re: Bayes FP/FN Training Procedures

2005-01-06 Thread Louis LeBlanc
On 01/06/05 08:41 AM, Jeff Koch sat at the `puter and typed:
 
 Has anyone come up with a script or method that would allow users to 
 forward their false positive and false negative emails back to an address 
 on the mailserver where they can be used to train the Bayes database. I 
 understand that Bayes needs the email in its original format so the script 
 has to strip off the forwarding enclosure.
 
 Thanks in advance.

Cool idea.  I have one that allows a user to send an email with a list
of addresses to whitelist or blacklist.  They send it to their own
address with a +whitelist or +blacklist extension.  Frinstance, I
could send to [EMAIL PROTECTED] and whitelist an
address.  Naturally, it requires a password in there as well, but it
works.  This really only boils down to a procmail recipe at the server
end, but I did write a quick mutt macro that uses formail to parse the
From address out of the message and autosend it using a script with
about 20 lines of Perl code.  It also assumes your MTA can handle
plussed folders, but this can be worked around with a subject scan or
something similar.

I wonder if the same thing could work with this idea.  One would have
to be careful what was passed into bayes.  Anyone know exactly what
and how this would need to be encapsulated?  I'm guessing it would
require some perlish at the server end to be called from procmail, but
it would have to be encapsulated carefully at the client end to avoid
piping the encapsulation headers through the learner.

XXX

Just because it's remotely relevant, I use maildir now with my mail
server.  This allows easy confirmation of spam by providing a
different subdirectory for new and read email.  So anything in the
.../cur directory is marked as read, and in the spam folder that
should be confirmed spam.  Autolearned spam goes into a different
folder altogether.  In my years with SA, this has a 0% FP rate, so I
don't feel I even have to bother with it anymore.

I wrote a script that uses Mail::SpamAssassin to parse the confirmed
spam, then move it to a spamdump folder.  I did some shameless
borrowing from sa-learn, giving credit in the script, of course.  By
default, the spamdump is recreated each month, leaving the old to be
purged at the users will.  I made my script extremely flexible, with
some powerful and flexible configuration methods, so you can pretty
much configure anything of consequence.

The reason I did this is that I wanted to be able to confirm spam and
have it learned as spam, then moved away.  The configuration uses a
list of directories expected to contain confirmed spam.

I also wanted to have autolearned spam moved out without trying to
relearn it.  This is done with another list of directories, containing
autolearned spam.  I wanted to include both read and unread
autolearned spam - remember, I'm getting 100% accuracy in this set -
so I simply included both directories in the list.

Naturally, it will also use a list of directories that contain
confirmed ham, and learn them as such, but these will be left where
they are.  No good hiding the users real mail, right?  At some point I
hope to keep track of the last time the script was run and use that
here to parse only files with a last mod or create time since the last
run.  Whether that approach is better than just rechecking all of them
may be debatable.

There is a configuration switch to autoreport all learned spam.  This
is off by default, and I haven't used it yet.

Once a month (when the new spamdump is created) the script will force
a sync and expire.  This can be done every time the script runs by
turning on a config switch.

Anyone interested it checking it out to provide feedback?  There are a
couple things that might be considered downsides or TODO items:

* The configuration method is a bit technical (has to be valid perl),
  but it's pretty powerful if you use your imagination.  At some
  point, I hope to find a way to do configuration through the
  Mail::SpamAssassin::Conf module for consistency, but I'm not sure how
  it will handle list definition, or even if that module was written
  to be used by other scripts.

* It is limited to directory based mail, no mbox or mbx files - it was
  written solely with maildir in mind.

* New spam archive folders are created with a system call - to
  maildirmake by default, but that can be changed to a mkdir -p
  command if necessary.  I've done a quick scan for a perl module to
  create the maildir, but haven't found one yet.  Courier IMAP doesn't
  have one, it uses a C/C++ utility to do it.

* Just because a file winds up in the confirmed spam directory doesn't
  guarantee it will be learned, but it will be scanned.  It isn't
  uncommon to see a message come through that has enough in common
  with a message already learned as spam to be skipped.  The script
  doesn't forget and relearn by default, so it might not catch the
  case of an autolearned FN.  To do this, I may need to duplicate the
  

Re: Bayes FP/FN Training Procedures

2005-01-06 Thread Vermyndax

On Thu, January 6, 2005 9:13 am, Louis LeBlanc said:
 On 01/06/05 08:41 AM, Jeff Koch sat at the `puter and typed:

 Has anyone come up with a script or method that would allow users to
 forward their false positive and false negative emails back to an
 address
 on the mailserver where they can be used to train the Bayes database. I
 understand that Bayes needs the email in its original format so the
 script
 has to strip off the forwarding enclosure.

 Thanks in advance.

FWIW, if you can educate your users to attach the email to a forward
rather than just clicking Forward, Outlook will preserve the headers in
the message that is set for inspection.  Unfortunately however, this means
that any scripts doing the processing on the server side will need to be
able to parse out those MIME headers.

If someone figures THAT out, they could be a savior.

--JM

[EMAIL PROTECTED]
http://blogs.galaxycow.com/vermyndax

Because this E mail address is transmission exclusive use, message it does
not reply, fish prayer it is to call it does.



RE: Bayes FP/FN Training Procedures

2005-01-06 Thread Aaron Grewell
If you're using Exchange/Outlook, just use a public folder.  Give the
users write-only access and let them drag and drop it in.  Works great.

On Thu, 2005-01-06 at 08:44 -0500, Jason Gauthier wrote:
 Neat! I was just thinking about how to do that myself.
 But, I use exchange, so I'm not sure how to do it yet.
  
 
  -Original Message-
  From: Jeff Koch [mailto:[EMAIL PROTECTED] 
  Sent: Thursday, January 06, 2005 8:42 AM
  To: users@spamassassin.apache.org
  Subject: Bayes FP/FN Training Procedures
  
  
  Has anyone come up with a script or method that would allow 
  users to forward their false positive and false negative 
  emails back to an address on the mailserver where they can be 
  used to train the Bayes database. I understand that Bayes 
  needs the email in its original format so the script has to 
  strip off the forwarding enclosure.
  
  Thanks in advance.
  
  
  
  
  Jeff Koch