About Training ( sa-learn )
I have set up my spamassassin to traing individual database (mysql ) with this filter in postfix spamassassin unix - n n - - pipe flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e /usr/sbin/sendmail -oi -f ${sender} -- ${recipient} as this filter works it auto learn in the database to individual user it gets learn! But if i send the same email that was autolearned it does not get an higher score.. it should be lik eit or it shoul get higher ? and how do i know if the training is working ? thanks! []'sf.rique
Re: About Training ( sa-learn )
Henrique Fernandes wrote on Thu, 4 Mar 2010 11:45:38 -0300: But if i send the same email that was autolearned it does not get an higher score.. it should be lik eit or it shoul get higher ? I if understand you correctly you want to learn a message twice. sa-learn won't do this. And the docs tell. Kai -- Get your web at Conactive Internet Services: http://www.conactive.com
Re: About Training ( sa-learn )
Nops, i wnat that after i trained, the same email, should get a higher score cause the spamassassin was trained that is a spam, so when it comes again , it should look in the database and add some extra point on the score right ? []'sf.rique On Thu, Mar 4, 2010 at 1:31 PM, Kai Schaetzl mailli...@conactive.comwrote: Henrique Fernandes wrote on Thu, 4 Mar 2010 11:45:38 -0300: But if i send the same email that was autolearned it does not get an higher score.. it should be lik eit or it shoul get higher ? I if understand you correctly you want to learn a message twice. sa-learn won't do this. And the docs tell. Kai -- Get your web at Conactive Internet Services: http://www.conactive.com
Re: About Training ( sa-learn )
Henrique Fernandes wrote: Nops, i wnat that after i trained, the same email, should get a higher score cause the spamassassin was trained that is a spam, so when it comes again , it should look in the database and add some extra point on the score right ? That is a fairly common misconception. When you learn an email as spam, the Bayes system breaks it into tokens (words/character strings) and then makes a note that each of those tokens was seen in a spam. When an email comes in, it breaks up the new email into tokens and then checks to see how frequently each of those tokens was previously seen in spam or ham. Based on what it finds, it ranks the email from BAYES_00 (very unlikely to be spam) to BAYES_99 (almost certainly spam). Since learning from a single email only adds one data point to each token, it is unlikely to make a major difference on its own. The value comes in learning from lots of spam and ham. This is why the Bayes rules will not run until you have learned from at least 200 ham and 200 spam. -- Bowie
Re: About Training ( sa-learn )
(Please send replies to the list) Henrique Fernandes wrote: On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey bowie_bai...@buc.com mailto:bowie_bai...@buc.com wrote: Henrique Fernandes wrote: Nops, i wnat that after i trained, the same email, should get a higher score cause the spamassassin was trained that is a spam, so when it comes again , it should look in the database and add some extra point on the score right ? That is a fairly common misconception. When you learn an email as spam, the Bayes system breaks it into tokens (words/character strings) and then makes a note that each of those tokens was seen in a spam. When an email comes in, it breaks up the new email into tokens and then checks to see how frequently each of those tokens was previously seen in spam or ham. Based on what it finds, it ranks the email from BAYES_00 (very unlikely to be spam) to BAYES_99 (almost certainly spam). Since learning from a single email only adds one data point to each token, it is unlikely to make a major difference on its own. The value comes in learning from lots of spam and ham. This is why the Bayes rules will not run until you have learned from at least 200 ham and 200 spam. hmm Thanks, so ech individual user has to have learned lots of emails so after that they will start to have an difference on score ? Yes. Each individual user will need to learn at least 200 ham and 200 spam (manually or via auto-learn) before Bayes will start scoring. The more they learn, the better the accuracy. So is better to just traing one database to all user instead one base for each user ? Making just one base i am afraid of getting to many false-positives. Cause sometimes Viagra is not spam for some one that researhc it, but if it is in the same base, it will be marked as spam... Depends on your users. Unless they are wildly different, a single database should work fairly well. Individual databases can be more accurate in some instances, but a single well-trained database will probably work better than a bunch of individual databases that are not trained consistently. -- Bowie
Re: About Training ( sa-learn )
Thanks! I will discuss here and find out with one is better. What are the weight of the bayser score after they well trained ? Have any ideas about it ? []'sf.rique On Thu, Mar 4, 2010 at 2:41 PM, Bowie Bailey bowie_bai...@buc.com wrote: (Please send replies to the list) Henrique Fernandes wrote: On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey bowie_bai...@buc.com mailto:bowie_bai...@buc.com wrote: Henrique Fernandes wrote: Nops, i wnat that after i trained, the same email, should get a higher score cause the spamassassin was trained that is a spam, so when it comes again , it should look in the database and add some extra point on the score right ? That is a fairly common misconception. When you learn an email as spam, the Bayes system breaks it into tokens (words/character strings) and then makes a note that each of those tokens was seen in a spam. When an email comes in, it breaks up the new email into tokens and then checks to see how frequently each of those tokens was previously seen in spam or ham. Based on what it finds, it ranks the email from BAYES_00 (very unlikely to be spam) to BAYES_99 (almost certainly spam). Since learning from a single email only adds one data point to each token, it is unlikely to make a major difference on its own. The value comes in learning from lots of spam and ham. This is why the Bayes rules will not run until you have learned from at least 200 ham and 200 spam. hmm Thanks, so ech individual user has to have learned lots of emails so after that they will start to have an difference on score ? Yes. Each individual user will need to learn at least 200 ham and 200 spam (manually or via auto-learn) before Bayes will start scoring. The more they learn, the better the accuracy. So is better to just traing one database to all user instead one base for each user ? Making just one base i am afraid of getting to many false-positives. Cause sometimes Viagra is not spam for some one that researhc it, but if it is in the same base, it will be marked as spam... Depends on your users. Unless they are wildly different, a single database should work fairly well. Individual databases can be more accurate in some instances, but a single well-trained database will probably work better than a bunch of individual databases that are not trained consistently. -- Bowie
Re: About Training ( sa-learn )
Henrique Fernandes wrote: Thanks! I will discuss here and find out with one is better. What are the weight of the bayser score after they well trained ? Have any ideas about it ? I'm not sure what you are asking. What do you mean by weight? The default scores (as of 3.2.5) are: BAYES_00-2.599 BAYES_05-1.110 BAYES_20-0.740 BAYES_40-0.185 BAYES_500.001 BAYES_601.0 BAYES_802.0 BAYES_953.0 BAYES_993.5 Take a look at /var/lib/spamassassin/version/updates_spamassassin_org/50_scores.cf to see the scores on your system. -- Bowie
Re: About Training ( sa-learn )
It was wht i asked, sorry i am not fluent in english It is the score that the bayes add to the final scores right ? []'sf.rique On Thu, Mar 4, 2010 at 4:36 PM, Bowie Bailey bowie_bai...@buc.com wrote: Henrique Fernandes wrote: Thanks! I will discuss here and find out with one is better. What are the weight of the bayser score after they well trained ? Have any ideas about it ? I'm not sure what you are asking. What do you mean by weight? The default scores (as of 3.2.5) are: BAYES_00-2.599 BAYES_05-1.110 BAYES_20-0.740 BAYES_40-0.185 BAYES_500.001 BAYES_601.0 BAYES_802.0 BAYES_953.0 BAYES_993.5 Take a look at /var/lib/spamassassin/version/updates_spamassassin_org/50_scores.cf to see the scores on your system. -- Bowie
Re: About Training ( sa-learn )
Right. Henrique Fernandes wrote: It was wht i asked, sorry i am not fluent in english It is the score that the bayes add to the final scores right ? []'sf.rique On Thu, Mar 4, 2010 at 4:36 PM, Bowie Bailey bowie_bai...@buc.com mailto:bowie_bai...@buc.com wrote: Henrique Fernandes wrote: Thanks! I will discuss here and find out with one is better. What are the weight of the bayser score after they well trained ? Have any ideas about it ? I'm not sure what you are asking. What do you mean by weight? The default scores (as of 3.2.5) are: BAYES_00-2.599 BAYES_05-1.110 BAYES_20-0.740 BAYES_40-0.185 BAYES_500.001 BAYES_601.0 BAYES_802.0 BAYES_953.0 BAYES_993.5 Take a look at /var/lib/spamassassin/version/updates_spamassassin_org/50_scores.cf http://50_scores.cf to see the scores on your system. -- Bowie
Re: About Training ( sa-learn )
On 4-Mar-2010, at 07:45, Henrique Fernandes wrote: I have set up my spamassassin to traing individual database (mysql ) with this filter in postfix spamassassin unix - n n - - pipe flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e /usr/sbin/sendmail -oi -f ${sender} -- ${recipient} Wait, what exactly is this doing? -- Windle shook his head sadly. Five exclamation marks, the sure sign of an insane mind. --Reaper Man
Re: About Training ( sa-learn )
Every email that comes in postfix i send to that filter, and this filter send the email. When i use the with the option -u ${recipient} it override the user that is runing and do the process with the user that is reciving the email, when it autolearn it goes to a diferent user in the table. So i have diferent databases for each user. And after go through the spamc filter it repass the email. good enough ? []'sf.rique On Thu, Mar 4, 2010 at 9:54 PM, LuKreme krem...@kreme.com wrote: On 4-Mar-2010, at 07:45, Henrique Fernandes wrote: I have set up my spamassassin to traing individual database (mysql ) with this filter in postfix spamassassin unix - n n - - pipe flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e /usr/sbin/sendmail -oi -f ${sender} -- ${recipient} Wait, what exactly is this doing? -- Windle shook his head sadly. Five exclamation marks, the sure sign of an insane mind. --Reaper Man
Training SA
Hi, I'm new to SA. I run an Exim/Dovecot CentOS 5.0 mailserver (VPS), on which I have recently installed SA. I have configured 'Autolearn = yes' but I have no way to know whether this is working. Please can someone explain to me how this works, since my understanding of this is as follows, and makes no sense! SpamAssassin identifies a mail as spam and stores the details of it so that it is easier to identify future emails which are similar. However, I fail to understand how this will help, since it's already successfully identifying those emails? Furthermore, I can I train SpamAssassin to recognize emails that it is currently giving only a very low score to, as spam? I'm getting many emails each day about Acai Berries but SA they are only getting a score of around 3.3! How can I train it to recognize these, server wide? Thanks. pete -- View this message in context: http://www.nabble.com/Training-SA-tp23921166p23921166.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: training SA
zigniew szalbot wrote: Hi, I tried to learn SA and used the following syntax: sa-learn --spam -f /usr/home/zbyszek/june.txt I guess I made a mistake with the syntax but how should I change it so that I can train SA? I already found out: sa-learn --spam --no-sync /usr/home/zbyszek/june.txt The important bit is that you leave off the '-f' since that specifies that the directories to learn from are IN the file you specify. The '--no-sync' can be useful, but remember that if you always learn that way, you need to run 'sa-learn --sync' from time to time. -- Bowie
training SA
Hello, I tried to learn SA and used the following syntax: sa-learn --spam -f /usr/home/zbyszek/june.txt archive-iterator: unable to open Dear Valued Customer,: No such file or directory june.txt is a spam email message downloaded from squirrelmail for the purpose of feeding to SA. I only got unable to open message. And at the end: Learned tokens from 0 message(s) (0 message(s) examined) I guess I made a mistake with the syntax but how should I change it so that I can train SA? Thank you in advance! Zbigniew Szalbot
Re: training SA
Hi, I tried to learn SA and used the following syntax: sa-learn --spam -f /usr/home/zbyszek/june.txt I guess I made a mistake with the syntax but how should I change it so that I can train SA? I already found out: sa-learn --spam --no-sync /usr/home/zbyszek/june.txt Sorry to have bothered! Warm regards, Zbigniew Szalbot
Re: training SA
On Wed, 27 Jun 2007 07:35:01 +0200 (CEST), zigniew szalbot [EMAIL PROTECTED] wrote: Hello, I tried to learn SA and used the following syntax: sa-learn --spam -f /usr/home/zbyszek/june.txt archive-iterator: unable to open Dear Valued Customer,: No such file or directory june.txt is a spam email message downloaded from squirrelmail for the purpose of feeding to SA. I only got unable to open message. And at the end: Learned tokens from 0 message(s) (0 message(s) examined) I guess I made a mistake with the syntax but how should I change it so that I can train SA? Hi, Have you double checked the path for typos? Also, you may well need the -u switch. I use: sa-learn --spam -u sauser /downloads/spam mv -f /downloads/spam/*.Mail /downloads/spam/fn The last bit mv -f /downloads/spam/*.Mail /downloads/spam/fn is just copying the file to a dir so I can track what's been trained and is probably surplus to your requirements. I have mine as a script so I just call ./ham or ./spam as required. HTH Nigel
Re: Training SA-Migrating from old IMAP to new IMAP server
On Sunday 11 March 2007 18:09, Don Ireland wrote: I'm my email over from the services of fusemail.com to the IMAP server that comes with my shared hosting account. When I copy my messages over from the old server, do I just run SA-learn against the messages as they are? Or will the fact that they have fusemail headers in them cause SA to think messages without fusemail headers are spam? If so, you can make bayes ignore those headers with bayes_ignore_header in local.cf. See the Mail::SpamAssassin::Conf(3pm) manpage. I've always deleted spam after training the filters so I don't have any to feed to to the new system. Will that be a problem? Having too great an imbalance in numbers between ham and spam will bias the bayes classifier towards everything is spam or in this case everything is ham. -- Magnus Holmgren[EMAIL PROTECTED] (No Cc of list mail needed, thanks) pgpaKX1rPnVSG.pgp Description: PGP signature
Re: Training SA-Migrating from old IMAP to new IMAP server
So it sounds like I may be better off NOT training on existing messages. Only on new that come in. Don Ireland -Original Message- From: Magnus Holmgren [EMAIL PROTECTED] Date: Monday, Mar 12, 2007 5:23 am Subject: Re: Training SA-Migrating from old IMAP to new IMAP server To: users@spamassassin.apache.org On Sunday 11 March 2007 18:09, Don Ireland wrote: I'm my email over from the services of fusemail.com to the IMAP server that comes with my shared hosting account. When I copy my messages over from the old server, do I just run SA-learn against the messages as they are? Or will the fact that they have fusemail headers in them cause SA to think messages without fusemail headers are spam? If so, you can make bayes ignore those headers with bayes_ignore_header in local.cf. See the Mail::SpamAssassin::Conf(3pm) manpage. I've always deleted spam after training the filters so I don't have any to feed to to the new system. Will that be a problem? Having too great an imbalance in numbers between ham and spam will bias the bayes classifier towards everything is spam or in this case everything is ham. -- Magnus Holmgren[EMAIL PROTECTED] (No Cc of list mail needed, thanks)
Training SA-Migrating from old IMAP to new IMAP server
I'm my email over from the services of fusemail.com to the IMAP server that comes with my shared hosting account. When I copy my messages over from the old server, do I just run SA-learn against the messages as they are? Or will the fact that they have fusemail headers in them cause SA to think messages without fusemail headers are spam? I've always deleted spam after training the filters so I don't have any to feed to to the new system. Will that be a problem? Don Ireland
Re: Training sa-learn from Outlook.
This sort of question gets asked a lot and there are various answers. The most common solution is to set up some public folders that are really IMAP folders, probably on your main mail machine, but that doesn't really matter much. Then as you suggest, run a cron job to pull the mail from them and do the learning. If you look in the wiki I believe there is a page or two devoted to this sort of thing with Outlook or OE. Do you have individual bayes databases or site-wide? If you have individual bayes databases then they would most likely each be under a usercode for the individual owner. In that case having global spam and ham folders won't work all that well, since you would have to learn the whole mess many times, once into each bayes database. It would make more sense to have per-user ham and spam folders, which could still use the IMAP solution. I assume that you have a global bayes database. In that case you should run sa-learn under whichever usercode SA is running under when it accesses that database. Loren - Original Message - From: Andrew van Tilburg To: users@spamassassin.apache.org Sent: Wednesday, September 20, 2006 10:37 PM Subject: Training sa-learn from Outlook. I imagine the following questions have been asked a lot, but I havent seen the exact answers Im after yet so here goes. We are running qmail, vpopmail, spamassassin, smb shares using samba, among other things, on freebsd. I want to set up public ham and spam folders such that our users can drag emails from Outlook. I can then set up a cron job that runs sa-learn on those folders and deletes the mail. Can I just create two public samba shares, then use those for the emails and run s-learn on them ? I guess not because the emails by this stage are wrecked by Outlook. How else can I do this ? Also, I dont understand exactly the implications of which user you run sa-learn under. How do I set this up when running sa-learn ? I suppose if I run it as the same user as vpopmail then this will work ? Apologies if these questions have already been covered in this mailing list or elsewhere. Andrew.
Training sa-learn from Outlook.
I imagine the following questions have been asked a lot, but I havent seen the exact answers Im after yet so here goes. We are running qmail, vpopmail, spamassassin, smb shares using samba, among other things, on freebsd. I want to set up public ham and spam folders such that our users can drag emails from Outlook. I can then set up a cron job that runs sa-learn on those folders and deletes the mail. Can I just create two public samba shares, then use those for the emails and run s-learn on them ? I guess not because the emails by this stage are wrecked by Outlook. How else can I do this ? Also, I dont understand exactly the implications of which user you run sa-learn under. How do I set this up when running sa-learn ? I suppose if I run it as the same user as vpopmail then this will work ? Apologies if these questions have already been covered in this mailing list or elsewhere. Andrew.
Re: Training SA with Thunderbird Junk folder
Edward Diener eddielee at tropicsoft.com writes: deleted... sth like this? sa-learn --mbox --spam --showdots Thunderbird_Junk_folder? That was what I was looking for. Thanks ! and also pls take care of running user (-u) and database path (--dbpath), as without running user parameter, sa-learn will find bayes_* files at current login user home directory, .spamassassin/ folder and overwrite/create file by this user. beware of dos/unix format after uploaded Junk folder file, as at FreeBSD, ascii upload seem no problem, but FC3 need to run dos2unix to reformat the folder file I have WinScp running, so I should be able to tell it to transform any Windows line endings to Unix line endings since the server is Linux and the client Windows. i just had another question, how to know the effect in blocking spam after sa-learn run? For example, dump the bayes_* file can give any hint on increase the accuracy? thx
Re: Training SA with Thunderbird Junk folder
martin wrote: Craig Morrison craigsa at 2cah.com writes: JamesDR wrote: Edward Diener wrote: Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? Upload them as single messages to your ISP account. If you have a special folder in TB (Thunderbird) for the messages you want to train on you can find that folder file (in your TB user folder) and upload that. TB stores messages in mbox format which SA can parse. I have my users use the redirect plugin to send spams to an account on the server just for this purpose. The redirect plugin will add a few headers (and so will your mail server) that need to be cleaned out first. If you just train on the junk folder file, you'll have to remove all of the thunderbird related stuff first -- this was more work than These will help for the TB headers: bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL bayes_ignore_header X-Mozilla-Status bayes_ignore_header X-Mozilla-Status2 Craig sth like this? sa-learn --mbox --spam --showdots Thunderbird_Junk_folder? That was what I was looking for. Thanks ! beware of dos/unix format after uploaded Junk folder file, as at FreeBSD, ascii upload seem no problem, but FC3 need to run dos2unix to reformat the folder file I have WinScp running, so I should be able to tell it to transform any Windows line endings to Unix line endings since the server is Linux and the client Windows.
Re: Training SA with Thunderbird Junk folder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matt Kettler wrote: Forrest Aldrich wrote: Such a mechanism would still depend upon some organization on the server side... as far as I can tell, it's very much to the local sysadmin (ie: aliases to send to, forward or attach properly, etc). Would this even work well potentially? You don't need any of that in modern SA. Spamd allows clients to connect and perform a learn operation if you start it with the --allow-tell command. All you'd need to do is set up spamd that way and have the t-bird plugin speak the same protocol as spamc does. (possibly not suited to all environments, but if you trust your users..) Might be interesting if there were somehow a way to collect data on the client side (ie: thunderbird/windows or whichever platform) and have a mechanism to contribute that data to your account (or database entry, if it's MySQL backend), to your bayes. Like spamd --allow-tell ? :) The problem with using that approach is that you can't authenticate users. In small, closed, trusted environments it can be useful, but in most situations, I don't think it will be usable. The nice thing about using an IMAP-based sollution is that the user is authenticated (provided you set it up correctly). Kind Regards, Sander Holthaus -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (MingW32) iD8DBQFEJAP6Vf373DysOTURArOVAJ91dXwfG1puzqTP/qXhWk848Ca3JACggnea qA2JXSGsngZmr3rsNHMJ8WQ= =ZHDo -END PGP SIGNATURE-
Re: Training SA with Thunderbird Junk folder
Sander Holthaus wrote: The problem with using that approach is that you can't authenticate users. In small, closed, trusted environments it can be useful, but in most situations, I don't think it will be usable. The nice thing about using an IMAP-based sollution is that the user is authenticated (provided you set it up correctly). Actually, there exists a plugin and a patch for a new plugin hook at implements a password for spamd protocol transactions. It never really went anywhere but could probably be picked up and fixed up a bit if there was enough interest. Michael
Re: Training SA with Thunderbird Junk folder
mouss wrote: Edward Diener a écrit : Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? My web host, where SA is tunning, suggests I do this in order to reduce the amount of spam I get, and I can login to my web host, transfer files from my local machine to my web host, and run SA commands. so the messages are accessible on your SA system? if so, then run spamassassin or spamc with the right option. what I would like to see is a plugin to J a message... If your mail server and users are using IMAP, the Junk E-mail folder is on the server already. I've got a script that runs from cron that will learn from that folder and then delete its contents several times a day. looks like this: #!/bin/bash sa-learn --spam --mbox ./mail/Junk E-mail rm ./mail/Junk E-mail touch ./mail/Junk E-mail you could probably adapt the concept to work system-wide, though I'm not sure how your hosting people would take to it. -Mike
Re: Training SA with Thunderbird Junk folder
Mike Pepe wrote: If your mail server and users are using IMAP, the Junk E-mail folder is on the server already. I've got a script that runs from cron that will learn from that folder and then delete its contents several times a day. My issue is when spam is missed, I'd like to J it so it goes to the Junk folder. This way, the server script will pick it. Unfortunately, if you don't enable TB adaptive filter, TB won't move the message to the Junk folder. This is a bug, but I don't know if it will ever be fixed (it dates back...). Now, I don't want the TB adaptive filter.
Re: Training SA with Thunderbird Junk folder
Edward Diener a écrit : Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? My web host, where SA is tunning, suggests I do this in order to reduce the amount of spam I get, and I can login to my web host, transfer files from my local machine to my web host, and run SA commands. so the messages are accessible on your SA system? if so, then run spamassassin or spamc with the right option. what I would like to see is a plugin to J a message...
Re: Training SA with Thunderbird Junk folder
mouss wrote: what I would like to see is a plugin to J a message... AOL Me Too! /AOL If anyone is a Thunderbird plugin wizard and interested in doing a plugin that will report/learn to spamd speak up, I'm very interested. Michael
Re: Training SA with Thunderbird Junk folder
Such a mechanism would still depend upon some organization on the server side... as far as I can tell, it's very much to the local sysadmin (ie: aliases to send to, forward or attach properly, etc). Would this even work well potentially? Might be interesting if there were somehow a way to collect data on the client side (ie: thunderbird/windows or whichever platform) and have a mechanism to contribute that data to your account (or database entry, if it's MySQL backend), to your bayes. Just some ramblings. Michael Parker wrote: mouss wrote: what I would like to see is a plugin to J a message... AOL Me Too! /AOL If anyone is a Thunderbird plugin wizard and interested in doing a plugin that will report/learn to spamd speak up, I'm very interested. Michael
Re: Training SA with Thunderbird Junk folder
Forrest Aldrich a écrit : Such a mechanism would still depend upon some organization on the server side... as far as I can tell, it's very much to the local sysadmin (ie: aliases to send to, forward or attach properly, etc). Would this even work well potentially? oh I'm not asking for that much. currently, TB offers you to makr a message as junk (in which case it can move it to a junk folder, or other). but it has two problems: - this enables TB filter. which I don't want - I see no keybinding (I'd like to just click J). I don't know how to write TB plugins, but this shouldn't be that hard, is it? Might be interesting if there were somehow a way to collect data on the client side (ie: thunderbird/windows or whichever platform) and have a mechanism to contribute that data to your account (or database entry, if it's MySQL backend), to your bayes. that would be another thing. but for those using imap, just putting it in a Junk folder is enough. for others, this is feasible, but more elaborate.
Re: Training SA with Thunderbird Junk folder
Forrest Aldrich wrote: Such a mechanism would still depend upon some organization on the server side... as far as I can tell, it's very much to the local sysadmin (ie: aliases to send to, forward or attach properly, etc). Would this even work well potentially? You don't need any of that in modern SA. Spamd allows clients to connect and perform a learn operation if you start it with the --allow-tell command. All you'd need to do is set up spamd that way and have the t-bird plugin speak the same protocol as spamc does. (possibly not suited to all environments, but if you trust your users..) Might be interesting if there were somehow a way to collect data on the client side (ie: thunderbird/windows or whichever platform) and have a mechanism to contribute that data to your account (or database entry, if it's MySQL backend), to your bayes. Like spamd --allow-tell ? :)
Training SA with Thunderbird Junk folder
Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? My web host, where SA is tunning, suggests I do this in order to reduce the amount of spam I get, and I can login to my web host, transfer files from my local machine to my web host, and run SA commands.
Re: Training SA with Thunderbird Junk folder
Edward Diener wrote: Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? My web host, where SA is tunning, suggests I do this in order to reduce the amount of spam I get, and I can login to my web host, transfer files from my local machine to my web host, and run SA commands. I have my users use the redirect plugin to send spams to an account on the server just for this purpose. The redirect plugin will add a few headers (and so will your mail server) that need to be cleaned out first. If you just train on the junk folder file, you'll have to remove all of the thunderbird related stuff first -- this was more work than redirecting to a mail box. I have a script (VBS) that runs on the mail server every night that takes the redirected mails, cleans the headers, and moves them over to the folder for the SA server to pick up from. A little later a bash script grabs the mail off the win mail server, runs through the files and learns them as spam (ham is done in bulk learns manually, but I could automate this as well.) This same mailbox is my spam trap, so any other mails that end up there are also trained as spam. With my user base, they are good enough to police themselves, and I have just about all of our customers and vendors whitelisted. -- Thanks, James
Re: Training SA with Thunderbird Junk folder
JamesDR wrote: Edward Diener wrote: Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? Upload them as single messages to your ISP account. If you have a special folder in TB (Thunderbird) for the messages you want to train on you can find that folder file (in your TB user folder) and upload that. TB stores messages in mbox format which SA can parse. I have my users use the redirect plugin to send spams to an account on the server just for this purpose. The redirect plugin will add a few headers (and so will your mail server) that need to be cleaned out first. If you just train on the junk folder file, you'll have to remove all of the thunderbird related stuff first -- this was more work than These will help for the TB headers: bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL bayes_ignore_header X-Mozilla-Status bayes_ignore_header X-Mozilla-Status2 Craig
Re: Training SA with Thunderbird Junk folder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Craig Morrison wrote: JamesDR wrote: Edward Diener wrote: Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? Upload them as single messages to your ISP account. If you have a special folder in TB (Thunderbird) for the messages you want to train on you can find that folder file (in your TB user folder) and upload that. TB stores messages in mbox format which SA can parse. I have my users use the redirect plugin to send spams to an account on the server just for this purpose. The redirect plugin will add a few headers (and so will your mail server) that need to be cleaned out first. If you just train on the junk folder file, you'll have to remove all of the thunderbird related stuff first -- this was more work than These will help for the TB headers: bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL bayes_ignore_header X-Mozilla-Status bayes_ignore_header X-Mozilla-Status2 Craig Optionally X-WebMail X-JunkFolder X-Message-Status X-SID-PRA X-SID-Result X-Message-Info if you're using the webmail-extension and a few other extensions... If you look back on the maillinglist, you should be able to find a discussion on using IMAP-folders to train SA. Might be helpfull as well. Kind Regards, Sander Holthaus -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (MingW32) iD8DBQFEIXFJVf373DysOTURAnX3AKCqCUoeQnBQLNBeKTJTWiq4qXY7OQCg63Rm NK6LfxwlrzYtioTUi26rlu8= =TOaz -END PGP SIGNATURE-
Re: Training SA with Thunderbird Junk folder
Craig Morrison craigsa at 2cah.com writes: JamesDR wrote: Edward Diener wrote: Does anybody know the instructions for training SA with the contents of the Thunderbird Junk folder ? Upload them as single messages to your ISP account. If you have a special folder in TB (Thunderbird) for the messages you want to train on you can find that folder file (in your TB user folder) and upload that. TB stores messages in mbox format which SA can parse. I have my users use the redirect plugin to send spams to an account on the server just for this purpose. The redirect plugin will add a few headers (and so will your mail server) that need to be cleaned out first. If you just train on the junk folder file, you'll have to remove all of the thunderbird related stuff first -- this was more work than These will help for the TB headers: bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL bayes_ignore_header X-Mozilla-Status bayes_ignore_header X-Mozilla-Status2 Craig sth like this? sa-learn --mbox --spam --showdots Thunderbird_Junk_folder? beware of dos/unix format after uploaded Junk folder file, as at FreeBSD, ascii upload seem no problem, but FC3 need to run dos2unix to reformat the folder file hope helpful.
Training SA with postfix
Title: Training SA with postfix Hey all, I've just spend a good amount of time installing postfix, amavis-new, clamAV and SA (with DCC, razor, pyzor) -- [All the latest versions] I'm trying to figure out if there is anyway I get incorporate sa-learn to learn ham based on what my people send through the box. This is a relay only server, which from my reading, kind of complicates things. My end goal, if possible, is to have sa-learn train itself on ham whenever I send mail outbound. Is this possible? If so, can someone help me with how it's done or point me to documentation?
Re: Training SA with postfix
At 09:10 AM 12/31/2004 -0500, Jason Gauthier wrote: I'm trying to figure out if there is anyway I get incorporate sa-learn to learn ham based on what my people send through the box. This is a relay only server, which from my reading, kind of complicates things. My end goal, if possible, is to have sa-learn train itself on ham whenever I send mail outbound. Is this possible? If so, can someone help me with how it's done or point me to documentation? One possible way of approximating this is to take some advantage of the autolearner... Write yourself a negative scoring rule that looks at the Received: headers for signs of relay from the inside. For added security against forgery you could use a meta rule and also check other header fields (message ID, from, etc). With a decently hefty negative scoring rule firing, the autolearner should try to learn most of the messages as ham.
RE: Training SA with postfix
Thanks for the tip. Due to my newbie-ness with these products I'm a little uncertain were to start. Amavis seems to build many rules, and interface with SA where it actually has options in it. Would I build this rule within amavis or SA? And of course, could you (or someone) point me to some documentation or example? I'm not sure where to even begin. Thanks, Jason -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Friday, December 31, 2004 9:31 AM To: Jason Gauthier; users@spamassassin.apache.org Subject: Re: Training SA with postfix At 09:10 AM 12/31/2004 -0500, Jason Gauthier wrote: I'm trying to figure out if there is anyway I get incorporate sa-learn to learn ham based on what my people send through the box. This is a relay only server, which from my reading, kind of complicates things. My end goal, if possible, is to have sa-learn train itself on ham whenever I send mail outbound. Is this possible? If so, can someone help me with how it's done or point me to documentation? One possible way of approximating this is to take some advantage of the autolearner... Write yourself a negative scoring rule that looks at the Received: headers for signs of relay from the inside. For added security against forgery you could use a meta rule and also check other header fields (message ID, from, etc). With a decently hefty negative scoring rule firing, the autolearner should try to learn most of the messages as ham.
RE: Training SA with postfix
At 02:45 PM 12/31/2004, Jason Gauthier wrote: Thanks for the tip. Due to my newbie-ness with these products I'm a little uncertain were to start. Amavis seems to build many rules, and interface with SA where it actually has options in it. Would I build this rule within amavis or SA? I'd do the rule as a SA rule, since it's SA's autolearner you want to affect. And of course, could you (or someone) point me to some documentation or example? http://wiki.apache.org/spamassassin/WritingRules So for this header: Received: from mattk-801-567.evi-inc.com (mattk-801-567.evitechnology.com [10.0.6.249]) by xanadu.evi-inc.com (8.12.8/8.12.8) with ESMTP id iBV0gIZP031926 Assuming my internal machines are 10.0.6.0/24, and all RDNS to evitechnology.com names, I might write: header L_OUTBOUND_MAIL Received =~ /from .{1,60}\.evitechnology.com \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com .{1,50} with ESMTP id/s score L_OUTBOUND_MAIL -1.0 Other, less specific variants: header L_OUTBOUND_MAIL0 Received =~ /from .{1,60}\.evitechnology.com \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com/s score L_OUTBOUND_MAIL0 -1.0 Caution: these last two are easily forged: header L_OUTBOUND_MAIL2 Received =~ /from .{1,60}\.evitechnology.com \[10\.0\.6\.\d{1,3}\]\)/ score L_OUTBOUND_MAIL2 -1.0 header L_OUTBOUND_MAIL3 Received =~ /from .{1,60}\.evitechnology.com/ score L_OUTBOUND_MAIL3 -1.0
RE: Training SA with postfix
Great! Using your example and the website I'm able to understand this much better. My idea is to start small and make sure it works. So I simply added this: header L_FROM Received =~ /server24/ score L_FROM -1.0 If the received line contains server24 then score it as -1.0. I know this is easy to fib, but like I said, it's just for testing :) I go ahead and look at the headers and see the following: Microsoft Mail Internet Headers Version 2.0 Received: from server24.ctg.com (unknown [192.168.50.11]) by spamfilter.lastar.com (Postfix) with ESMTP id 9EACAEFCC1 for [EMAIL PROTECTED]; Fri, 31 Dec 2004 16:09:23 -0500 (EST) The originating server is server24, then it hits spamfilter. As you can see server24 is contained in that string. But looking below, I see spam_scan is scored as 0.28. Dec 31 16:09:24 spamfilter amavis[8276]: (08276-02) spam_scan: hits=0.28 tests=ALL_TRUSTED,AWL,HTML_90_100,HTML_MESSAGE,HTML_SHORT_COMMENT I looked at the headers and I don't see the X-Spam-* headers at all, (I set it to -999), so I'm not sure why amavisd-new didn't add the headers. -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Friday, December 31, 2004 3:07 PM To: users@spamassassin.apache.org Subject: RE: Training SA with postfix At 02:45 PM 12/31/2004, Jason Gauthier wrote: Thanks for the tip. Due to my newbie-ness with these products I'm a little uncertain were to start. Amavis seems to build many rules, and interface with SA where it actually has options in it. Would I build this rule within amavis or SA? I'd do the rule as a SA rule, since it's SA's autolearner you want to affect. And of course, could you (or someone) point me to some documentation or example? http://wiki.apache.org/spamassassin/WritingRules So for this header: Received: from mattk-801-567.evi-inc.com (mattk-801-567.evitechnology.com [10.0.6.249]) by xanadu.evi-inc.com (8.12.8/8.12.8) with ESMTP id iBV0gIZP031926 Assuming my internal machines are 10.0.6.0/24, and all RDNS to evitechnology.com names, I might write: header L_OUTBOUND_MAIL Received =~ /from .{1,60}\.evitechnology.com \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com .{1,50} with ESMTP id/s score L_OUTBOUND_MAIL -1.0 Other, less specific variants: header L_OUTBOUND_MAIL0 Received =~ /from .{1,60}\.evitechnology.com \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com/s score L_OUTBOUND_MAIL0 -1.0 Caution: these last two are easily forged: header L_OUTBOUND_MAIL2 Received =~ /from .{1,60}\.evitechnology.com \[10\.0\.6\.\d{1,3}\]\)/ score L_OUTBOUND_MAIL2 -1.0 header L_OUTBOUND_MAIL3 Received =~ /from .{1,60}\.evitechnology.com/ score L_OUTBOUND_MAIL3 -1.0
Re: Training SA with postfix
Jason Gauthier wrote: Thanks for the tip. Due to my newbie-ness with these products I'm a little uncertain were to start. Amavis seems to build many rules, and interface with SA where it actually has options in it. Read the docs at the amavisd-new site here: -- http://www.ijs.si/software/amavisd/ Amavis runs SA, but does not allow SA to rewrite the message. Amavis does the rewriting, quarantining, and ultimate scoring. SA still looks to its own config file (typically named local.cf) to run and score all of its tests, it just doesn't get to rewrite the original message. More info here: -- http://www.ijs.si/software/amavisd/ Would I build this rule within amavis or SA? All SA rules go in SA config (ok, this may be too absolute, I just can't think of any at the moment ;-). There are many ways to train this anti-spam software stack (amavis/sa/razor/pyzor/bayes/etc.). Amavisd can soft-blacklist, blacklist, and soft-whitelist based on *envelope senders*, while SA's black and whitelists work on message headers. SA also has the trainable bayes engine. It all depends on what kind of features, performance, flexibility, accuracy, etc. etc. etc. that you need. - Sam Nilsson
Re: Training SA with postfix
Sam Nilsson wrote: SA still looks to its own config file (typically named local.cf) to run and score all of its tests, it just doesn't get to rewrite the original message. More info here: -- http://www.ijs.si/software/amavisd/ Sorry! More info here: -- http://www.ijs.si/software/amavisd/#faq-spam - Sam