Re: Bayes FP/FN Training Procedures
Hi Jeff, Jeff Koch wrote: Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. On our imap server, each user may create/use two specific mailfolders, named Bayes and SpamErrors (name are _not_ important). The first one is for false negatives, the other for false positive. A script runs daily on the server and feeds those folders' contents to sa-learn. All the user has to do is move/copy his false (positives|negatives) to the proper folder. Hth, -- Pierre-Yves Bonnetain BA Consultants - Sécurité informatique - www.ba-cst.com Tel. : +33 (0) 563 277 241 - Fax : +33 (0) 563 277 245
Bayes FP/FN Training Procedures
Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. Thanks in advance. Jeff Koch
RE: Bayes FP/FN Training Procedures
Neat! I was just thinking about how to do that myself. But, I use exchange, so I'm not sure how to do it yet. -Original Message- From: Jeff Koch [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 8:42 AM To: users@spamassassin.apache.org Subject: Bayes FP/FN Training Procedures Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. Thanks in advance. Jeff Koch
RE: Bayes FP/FN Training Procedures
I have a script that I use with Exchange/Outlook for Bayes training, but it's not simple. You can't just forward a message back to the SA box, since Outlook deletes most of the original headers. You have to cut-n-paste the whole email into a new email and send THAT to the SA box. There the script unencapsulates the email and feeds it to sa-learn. I don't consider my script to be distributable, and it's only part of a larger scheme involving DNS, sendmail redirection, and other variables, but at least this might give you some ideas to play with. Pierre Thomson BIC -Original Message- From: Jason Gauthier [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 8:44 AM To: Jeff Koch; users@spamassassin.apache.org Subject: RE: Bayes FP/FN Training Procedures Neat! I was just thinking about how to do that myself. But, I use exchange, so I'm not sure how to do it yet. -Original Message- From: Jeff Koch [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 8:42 AM To: users@spamassassin.apache.org Subject: Bayes FP/FN Training Procedures Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. Thanks in advance. Jeff Koch
Re: Bayes FP/FN Training Procedures
If you have an imap server. what I have done is that I have setup two publice folders and then I use a script that I found on the internet to read and rebuild the bayes. The users copy the spam message in a SPAM folder and the ham into a NOT SPAM folder this keeps the message in tact. I subscribe them to the folder and then let the script run once a day. I am sure you could do this with exchange's public folders and then use the IMAP server port to teach bayes. Carinus
RE: Bayes FP/FN Training Procedures
-Original Message- From: Carinus Carelse [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 8:27 AM To: users@spamassassin.apache.org Subject: Re: Bayes FP/FN Training Procedures If you have an imap server. what I have done is that I have setup two publice folders and then I use a script that I found on the internet to read and rebuild the bayes. The users copy the spam message in a SPAM folder and the ham into a NOT SPAM folder this keeps the message in tact. I subscribe them to the folder and then let the script run once a day. I am sure you could do this with exchange's public folders and then use the IMAP server port to teach bayes. This is what I have done as well. It's much easier this way. The script I'm using was found through a search of the SA list archives on GMANE. Best of luck. -Joe K.
Re: Bayes FP/FN Training Procedures
On 01/06/05 08:41 AM, Jeff Koch sat at the `puter and typed: Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. Thanks in advance. Cool idea. I have one that allows a user to send an email with a list of addresses to whitelist or blacklist. They send it to their own address with a +whitelist or +blacklist extension. Frinstance, I could send to [EMAIL PROTECTED] and whitelist an address. Naturally, it requires a password in there as well, but it works. This really only boils down to a procmail recipe at the server end, but I did write a quick mutt macro that uses formail to parse the From address out of the message and autosend it using a script with about 20 lines of Perl code. It also assumes your MTA can handle plussed folders, but this can be worked around with a subject scan or something similar. I wonder if the same thing could work with this idea. One would have to be careful what was passed into bayes. Anyone know exactly what and how this would need to be encapsulated? I'm guessing it would require some perlish at the server end to be called from procmail, but it would have to be encapsulated carefully at the client end to avoid piping the encapsulation headers through the learner. XXX Just because it's remotely relevant, I use maildir now with my mail server. This allows easy confirmation of spam by providing a different subdirectory for new and read email. So anything in the .../cur directory is marked as read, and in the spam folder that should be confirmed spam. Autolearned spam goes into a different folder altogether. In my years with SA, this has a 0% FP rate, so I don't feel I even have to bother with it anymore. I wrote a script that uses Mail::SpamAssassin to parse the confirmed spam, then move it to a spamdump folder. I did some shameless borrowing from sa-learn, giving credit in the script, of course. By default, the spamdump is recreated each month, leaving the old to be purged at the users will. I made my script extremely flexible, with some powerful and flexible configuration methods, so you can pretty much configure anything of consequence. The reason I did this is that I wanted to be able to confirm spam and have it learned as spam, then moved away. The configuration uses a list of directories expected to contain confirmed spam. I also wanted to have autolearned spam moved out without trying to relearn it. This is done with another list of directories, containing autolearned spam. I wanted to include both read and unread autolearned spam - remember, I'm getting 100% accuracy in this set - so I simply included both directories in the list. Naturally, it will also use a list of directories that contain confirmed ham, and learn them as such, but these will be left where they are. No good hiding the users real mail, right? At some point I hope to keep track of the last time the script was run and use that here to parse only files with a last mod or create time since the last run. Whether that approach is better than just rechecking all of them may be debatable. There is a configuration switch to autoreport all learned spam. This is off by default, and I haven't used it yet. Once a month (when the new spamdump is created) the script will force a sync and expire. This can be done every time the script runs by turning on a config switch. Anyone interested it checking it out to provide feedback? There are a couple things that might be considered downsides or TODO items: * The configuration method is a bit technical (has to be valid perl), but it's pretty powerful if you use your imagination. At some point, I hope to find a way to do configuration through the Mail::SpamAssassin::Conf module for consistency, but I'm not sure how it will handle list definition, or even if that module was written to be used by other scripts. * It is limited to directory based mail, no mbox or mbx files - it was written solely with maildir in mind. * New spam archive folders are created with a system call - to maildirmake by default, but that can be changed to a mkdir -p command if necessary. I've done a quick scan for a perl module to create the maildir, but haven't found one yet. Courier IMAP doesn't have one, it uses a C/C++ utility to do it. * Just because a file winds up in the confirmed spam directory doesn't guarantee it will be learned, but it will be scanned. It isn't uncommon to see a message come through that has enough in common with a message already learned as spam to be skipped. The script doesn't forget and relearn by default, so it might not catch the case of an autolearned FN. To do this, I may need to duplicate the
Re: Bayes FP/FN Training Procedures
On Thu, January 6, 2005 9:13 am, Louis LeBlanc said: On 01/06/05 08:41 AM, Jeff Koch sat at the `puter and typed: Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. Thanks in advance. FWIW, if you can educate your users to attach the email to a forward rather than just clicking Forward, Outlook will preserve the headers in the message that is set for inspection. Unfortunately however, this means that any scripts doing the processing on the server side will need to be able to parse out those MIME headers. If someone figures THAT out, they could be a savior. --JM [EMAIL PROTECTED] http://blogs.galaxycow.com/vermyndax Because this E mail address is transmission exclusive use, message it does not reply, fish prayer it is to call it does.
RE: Bayes FP/FN Training Procedures
If you're using Exchange/Outlook, just use a public folder. Give the users write-only access and let them drag and drop it in. Works great. On Thu, 2005-01-06 at 08:44 -0500, Jason Gauthier wrote: Neat! I was just thinking about how to do that myself. But, I use exchange, so I'm not sure how to do it yet. -Original Message- From: Jeff Koch [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 8:42 AM To: users@spamassassin.apache.org Subject: Bayes FP/FN Training Procedures Has anyone come up with a script or method that would allow users to forward their false positive and false negative emails back to an address on the mailserver where they can be used to train the Bayes database. I understand that Bayes needs the email in its original format so the script has to strip off the forwarding enclosure. Thanks in advance. Jeff Koch