Re: Training Q
John D. Hardin schrieb: On Wed, 16 Jan 2008 [EMAIL PROTECTED] wrote: So, all 3 categories include emails that SA has already seen and presumably included in its Bayesian filters, Only if you have autolearn enabled. Can we assume that you do from this question? You didn't explicitly say. and emails that it has never seen. My question is, should I write a program to take out emails that SA has already seen before I send them through Bayesian processing, or is it smart enough not to process those again? sa-learn won't re-learn messages it has already seen unless you change their classification (e.g. was ham, re-learn as spam). Don't worry about it. In addition, keeping a full corpus around helps re-learning from scratch should you ever need to do so. Some people advise not to relearn old spam what would you suggest, learn only last 6 month e.g.? -- Gruesse/Greetings MH Dont send mail to: [EMAIL PROTECTED] --
Re: Training Q
Matthias Haegele schrieb: Some people advise not to relearn old spam what would you suggest, learn only last 6 month e.g.? I meant if you must relearn from scratch how far you would go back? -- Gruesse/Greetings MH Dont send mail to: [EMAIL PROTECTED] --
Re: Training Q
Some people advise not to relearn old spam what would you suggest, learn only last 6 month e.g.? I'd suggest only the last 3 months or less of spam if you have enough. Old ham should be fine though. Loren
Problem with sa-learn and virtual user
Hi, My mail system use virtual user. I use spamd like this spamd --virtual-config-dir=/srv/spamassassin/%d/%l -x -u dovecot -c -i 127.0.0.1 -d -r /var/run/spamd.pid I run spamc with /usr/pkg/bin/spamc -u ${recipient} -f -e ... This work fine, each user can use it's own sa config. But i would like to be able to run sa-learn for spefic users I tryed sa-lean --username [EMAIL PROTECTED] --spam files But as I can see with debug (-D) it use bayes file of the unix user running the command. So i'm wondering how i sould do, I'm especially wondering how sa-learn can now that it should use /srv/spamassassin/%d/%l for bayes file but did not find the answer.
Re: A rule to match patterns on recipient name.
Loren Wilton wrote: Valid email addresses have a well-known structure (i.e. [A-z.]*_NAME) so, for example [EMAIL PROTECTED] is clearly a bogus address. Off the top of my head you might be able to do something like (untested): header__GOOD_NAMETo=~ /[A-Za-z]{1,30}_[A-Za-z\d\.]{2,40}\@(?i:domain\.com)/ metaBAD_NAME!__GOOD_NAME scoreBAD_NAME2 Above is based on the assumption that NAME includes only letters, numbers, and dots. If it can also have underscores then you could just do \w{2,40} or the like for the second part. Hmmm - not a bad start, I guess. If I were to put something like this in individual users' .spamassassin/user_prefs - then I could be even more restrictive about NAME. I am concerned, however, that this might not cope well with mailing lists (where To is the mailing list name) or in circumstances where the user is CC'd rather than addressed directly.
Re: Problem with sa-learn and virtual user
Jean-Edouard Babin wrote: On Jan 17, 2008 1:38 PM, Jonathan Armitage [EMAIL PROTECTED] wrote: Jean-Edouard Babin wrote: But i would like to be able to run sa-learn for spefic users I tryed sa-lean --username [EMAIL PROTECTED] --spam files But as I can see with debug (-D) it use bayes file of the unix user running the command. Try su - username -c sa-learn --spam spamdir Jon User are virtual. I think there is another way, but can't remember offhand. Look back through the mailing list. It was discussed a few months ago. Jon
Re: Problem with sa-learn and virtual user
Jean-Edouard Babin wrote: Hi, My mail system use virtual user. I use spamd like this spamd --virtual-config-dir=/srv/spamassassin/%d/%l -x -u dovecot -c -i 127.0.0.1 http://127.0.0.1 -d -r /var/run/spamd.pid I run spamc with /usr/pkg/bin/spamc -u ${recipient} -f -e ... This work fine, each user can use it's own sa config. But i would like to be able to run sa-learn for spefic users I tryed sa-lean --username [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] --spam files But as I can see with debug (-D) it use bayes file of the unix user running the command. Yes, the --username in sa-learn *ONLY* works with SQL backends, as per the docs. So i'm wondering how i sould do, I'm especially wondering how sa-learn can now that it should use /srv/spamassassin/%d/%l for bayes file but did not find the answer. you'd have to use the --dbpath option to over-ride SA's default idea of where the bayes database lives.
sa-learn error message
Hi again SA experts, Note the error message in the 2nd-last line of the following transcript: animalhead:~/sj $ sa-learn --no-rebuild --spam --mbox savejunk The --no-rebuild option has been deprecated. Please use --no-sync instead. Learned tokens from 3025 message(s) (3047 message(s) examined) animalhead:~/sj $ sa-learn --no-sync --spam thruJunk bayes: bayes db version 0 is not able to be used, aborting! at /usr/ local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/DBM.pm line 196. Learned tokens from 170 message(s) (170 message(s) examined) There are 171 messages in directory thruJunk. The largest is 495K, the next largest is 137K. $ sa-learn -Vyields spamassassin v 3.2.1 What should I do about this? I still have another directory with ham to go. It includes lots of large files. Should I delete those over a certain size? Thanks, Craig MacKenna
Re: A rule to match patterns on recipient name.
Bowie Bailey wrote: Catch-all setups always have this problem. You could use SA to figure out which addresses are likely to be valid, but this means that you have to accept the message and then call SA for EVERY one of these emails. I'm aware of that... but the benefits outweigh the problems. The best way is to use your MTA. Set up a method for your users to create these email addresses as real email aliases in your MTA. Then you can set your MTA to only accept valid email addresses and the problem goes away. That would be a problem - since there is no definitive list of 'valid' email addresses - however I do know the form of all valid email addresses. If I could replace list-based lookup with a function to parse and validate email addresses with my MTA, I'd be laughing. It's no big problem to process every spam - but it would be desirable to at least mark all these made-up email address destinations with a higher spam score. Is there an existing rule I can customise, or would I have to start from scratch?
RE: A rule to match patterns on recipient name.
Steve wrote: Loren Wilton wrote: Valid email addresses have a well-known structure (i.e. [A-z.]*_NAME) so, for example [EMAIL PROTECTED] is clearly a bogus address. Off the top of my head you might be able to do something like (untested): header__GOOD_NAMETo=~ /[A-Za-z]{1,30}_[A-Za-z\d\.]{2,40}\@(?i:domain\.com)/ metaBAD_NAME!__GOOD_NAME scoreBAD_NAME2 Above is based on the assumption that NAME includes only letters, numbers, and dots. If it can also have underscores then you could just do \w{2,40} or the like for the second part. Hmmm - not a bad start, I guess. If I were to put something like this in individual users' .spamassassin/user_prefs - then I could be even more restrictive about NAME. I am concerned, however, that this might not cope well with mailing lists (where To is the mailing list name) or in circumstances where the user is CC'd rather than addressed directly. That can be fixed by having the MTA (or MDA) add a Delivered-To header indicating the user the message is being delivered to. Then you can use this header rather than having to rely on something sensible being in the To or Cc headers. -- Bowie
RE: A rule to match patterns on recipient name.
Steve Haeck wrote: Bowie Bailey wrote: The best way is to use your MTA. Set up a method for your users to create these email addresses as real email aliases in your MTA. Then you can set your MTA to only accept valid email addresses and the problem goes away. That would be a problem - since there is no definitive list of 'valid' email addresses - however I do know the form of all valid email addresses. If I could replace list-based lookup with a function to parse and validate email addresses with my MTA, I'd be laughing. That should be possible. The amount of work necessary will depend on your MTA. With Courier, you can write a filter module in Perl or PHP. I don't know about the others. It's no big problem to process every spam - but it would be desirable to at least mark all these made-up email address destinations with a higher spam score. Is there an existing rule I can customise, or would I have to start from scratch? The problems with processing every spam tend to rise with the amount of junkmail you receive. I had to rework my mail setup here a few years ago when the amount of spam coming in was more than my system could deal with. Once my front-line mail server could reject invalid email addresses, my mail volume dropped way down and allowed my servers to keep up without a spam blast causing massive delays. There is no existing rule, but writing one shouldn't be difficult. One suggestion has already been given. The main hassle will be with mailing lists and such which send via BCC. -- Bowie
Re: A rule to match patterns on recipient name.
Steve wrote: Bowie Bailey wrote: That can be fixed by having the MTA (or MDA) add a Delivered-To header indicating the user the message is being delivered to. Then you can use this header rather than having to rely on something sensible being in the To or Cc headers. I always wondered where Delivered-To was added - and why some messages I've seen have it and others don't. Time to break out the postfix manual... :-) if delivering via a pipe, set the 'D' flag. note that you can configure postfix to reject invalid addresses instead of doing this in SA. an alternative to your scheme is to use address extensions. for example, using '-' as the delimiter ('+' is refused by many sites), each user can receive mail for [EMAIL PROTECTED] you can also give each users two addresses, say steve.haeck and steve. the first is used privately, the second is always used with an extension. so you can reject mail to [EMAIL PROTECTED] and accept [EMAIL PROTECTED]
Re: A rule to match patterns on recipient name.
Bowie Bailey wrote: That can be fixed by having the MTA (or MDA) add a Delivered-To header indicating the user the message is being delivered to. Then you can use this header rather than having to rely on something sensible being in the To or Cc headers. I always wondered where Delivered-To was added - and why some messages I've seen have it and others don't. Time to break out the postfix manual... :-) Thanks, Steve
Re: Problem with sa-learn and virtual user
On Jan 17, 2008 2:31 PM, Matt Kettler [EMAIL PROTECTED] wrote: Jean-Edouard Babin wrote: Hi, My mail system use virtual user. I use spamd like this spamd --virtual-config-dir=/srv/spamassassin/%d/%l -x -u dovecot -c -i 127.0.0.1 http://127.0.0.1 -d -r /var/run/spamd.pid I run spamc with /usr/pkg/bin/spamc -u ${recipient} -f -e ... This work fine, each user can use it's own sa config. But i would like to be able to run sa-learn for spefic users I tryed sa-lean --username [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] --spam files But as I can see with debug (-D) it use bayes file of the unix user running the command. Yes, the --username in sa-learn *ONLY* works with SQL backends, as per the docs. Yes thanks, that what i see in the doc after upgrading So i'm wondering how i sould do, I'm especially wondering how sa-learn can now that it should use /srv/spamassassin/%d/%l for bayes file but did not find the answer. you'd have to use the --dbpath option to over-ride SA's default idea of where the bayes database lives. Ok, -p seem to be fine also, but it's of course a less automatic solution. thanks,
Re: A rule to match patterns on recipient name.
header__GOOD_NAMETo=~ /[A-Za-z]{1,30}_[A-Za-z\d\.]{2,40}\@(?i:domain\.com)/ metaBAD_NAME!__GOOD_NAME scoreBAD_NAME2 Above is based on the assumption that NAME includes only letters, numbers, and dots. If it can also have underscores then you could just do \w{2,40} or the like for the second part. Hmmm - not a bad start, I guess. If I were to put something like this in individual users' .spamassassin/user_prefs - then I could be even more restrictive about NAME. I am concerned, however, that this might not cope well with mailing lists (where To is the mailing list name) or in circumstances where the user is CC'd rather than addressed directly. It will surely fail on mailing lists and Bcc items, which is why I gave it a relatively low score. You had seemingly specifically said To previously. You can use ToCc in place of To in the rule and catch both To and CC. Loren
Re: How to skip checking emails over a certain size?
Theo Van Dinter felicity at apache.org writes: spamd[2492]: razor2: razor2 check failed: razor2: razor2 had unknown error during check at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Plugin/Razor2.pm line 211, GEN25 line 1. at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Plugin/Razor2.pm line 326. Run spamd w/ -D (or -D razor2 for more output) and find out if there's any actual error messages for one of the problematic messages. Hi, Apparently, it was unrelated to the size of the email. It was some type of registration error with razor2. If I did a razor-admin -register, it would abort with an Error 202. Couldn't figure out why this was happening but installing the latest version (2.84) over my older razor2 install (2.67) seems to make razor-admin -register work again. Am still getting these razor2 had unknown error entries but instead of a dozen or so each hour, it's more like 1 every few hours now. But the emails are properly getting tagged with RAZOR2_CHECK so I guess it's working OK. Thanks.
SA: failed to run header tests, skipping some.
Hello everyone, I have been having some issues with Spamassassin and have been ironing things out (like child processes not becoming re-usable), but there is one that floors me (probably because I'm not a perl expert.. but hey). Anyway here goes my configuration and the errors I am seeing, I hope someone with a kind heart can help me out. Mail server information: Operating System is Debian 3.1 (Sarge) MTA is Qmail (as per Shupp's toaster) Spamassassin is version 3.1.4 ClamAV is version 0.92 Perl is version 5.8.4 Spamassassin invocation is by init.d : spamd -q -x -m 5 -H -d --pidfile=/var/run/spamd.pid Ok, so now for the errors that I am getting for each email receipted by our mail server, from /var/log/mail.warn: Jan 18 10:14:24 tuatara spamd[18169]: Number found where operator expected at (eval 878) line 10, near } Jan 18 10:14:24 tuatara spamd[18169]: Jan 18 10:14:24 tuatara spamd[18169]: 1 Jan 18 10:14:24 tuatara spamd[18169]: (Missing operator before Jan 18 10:14:24 tuatara spamd[18169]: Jan 18 10:14:24 tuatara spamd[18169]: 1?) Jan 18 10:14:24 tuatara spamd[18169]: rules: failed to run header tests, skipping some: syntax error at (eval 878) line 11, near ; Jan 18 10:14:24 tuatara spamd[18169]: } Jan 18 10:14:24 tuatara spamd[18169]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2656, GEN10 line 28. Jan 18 10:14:24 tuatara spamd[18169]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2657, GEN10 line 28. Jan 18 10:14:24 tuatara last message repeated 2 times Jan 18 10:14:24 tuatara spamd[18169]: Number found where operator expected at (eval 879) line 10, near } Jan 18 10:14:24 tuatara spamd[18169]: Jan 18 10:14:24 tuatara spamd[18169]: 1 Jan 18 10:14:24 tuatara spamd[18169]: (Missing operator before Jan 18 10:14:24 tuatara spamd[18169]: Jan 18 10:14:24 tuatara spamd[18169]: 1?) Jan 18 10:14:24 tuatara spamd[18169]: rules: failed to run header tests, skipping some: syntax error at (eval 879) line 11, near ; Jan 18 10:14:24 tuatara spamd[18169]: } Jan 18 10:14:24 tuatara spamd[18169]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2656, GEN10 line 28. Jan 18 10:14:24 tuatara spamd[18169]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2657, GEN10 line 28. Jan 18 10:14:24 tuatara last message repeated 2 times I am more than happy to attach my PerMsgStatus.pm if anyone would like to peruse it. I have attempted to find the problem in this file, but don't understand it enough, or the problem is not actually in there. Any help anyone can give me would be truly appreciated! Cheers, Michael Hutchinson Linux Systems Administrator Manux Solutions Ltd [EMAIL PROTECTED] http://www.manux.co.nz
Re: sa-learn error message
Theo Van Dinter wrote: On Thu, Jan 17, 2008 at 03:28:06PM -0600, Steven Stern wrote: bayes db version 0 indicates your bayes file is corrupt. It should be version 3. Do you have a backup? SQL or .db? It doesn't necessarily mean there's corruption, in fact, since the learning continued and seemed to finish ok, it's unlikely to be corruption. See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3563 for a possible libdb issue which causes it. Thanks. I ran into this when I hosed the sa_bayes MySQL database as we were cloning one of our MX servers.
Re: sa-learn error message
Thank you to both responders. Did I read something that said that the digit after bayes db version indicated the version of Berkeley DB that's installed on the system? Like 0 means 1.x... Google shows various messages like bayes db version 2 is not able to be used, aborting! which would seem to indicate that 0 is not indicative of the problem I saw. Perhaps the reason that the bug report lists 0 is that Berkeley DB version 1.x does not include an integrated locking mechanism, but higher versions are reputed to have such a mechanism. Before I got your responses, I got my courage up and went on to run $ sa-learn --no-sync --ham ham (ham is the name of a directory in the current working directory) $ sa-learn --sync and everything went well. This sort of indicates that the DB isn't hopelessly corrupt. Please, where is this DB that I should back up? I wrote a GP Berkeley DB rebuilding program that reads all of the key/ value pairs in a DB, and writes all of those for which the key and value are defined and of non-zero length, to a new DB. I could try running that and see if the new DB is significantly smaller than the old, which for my DBs indicates that it's time to use the new DB. Theo, do you know if SA uses any entries with null keys or values, that are needed for proper operation? It would be easy to keep entries with null values; I wrote the program to discard them because my DBs don't use such entries. Thanks, Craig MacKenna On Jan 17, 2008, at 1:50 PM, Theo Van Dinter wrote: On Thu, Jan 17, 2008 at 03:28:06PM -0600, Steven Stern wrote: bayes db version 0 indicates your bayes file is corrupt. It should be version 3. Do you have a backup? SQL or .db? It doesn't necessarily mean there's corruption, in fact, since the learning continued and seemed to finish ok, it's unlikely to be corruption. See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3563 for a possible libdb issue which causes it. -- Randomly Selected Tagline: And the No. 1 response that you'll need to memorize if you plan to bet your business on Windows 2000: 'You want fries with that?' - Nicholas Petreley
Re: sa-learn error message
On Thu, Jan 17, 2008 at 07:42:30PM -0800, [EMAIL PROTECTED] wrote: Did I read something that said that the digit after bayes db version indicated the version of Berkeley DB that's installed on the system? Like 0 means 1.x... Google shows various messages like bayes db version 2 is not able to be used, aborting! which would seem to indicate that 0 is not indicative of the problem I saw. The bayes version has nothing to do with the version of Berkeley DB. It's the version of the Bayes data. It's been 3 for a while now. from man sa-learn: The database ’version number’ is 0 for databases from 2.5x, 1 for databases from certain 2.6x development releases, and 2 for all more recent databases. Hrm. Interestingly, it doesn't mention version 3, which was introduced in 3.0 and has been used in all later versions. I'll update the man page in a minute. :) Perhaps the reason that the bug report lists 0 is that Berkeley DB version 1.x does not include an integrated locking mechanism, but higher versions are reputed to have such a mechanism. The DB_File module, used to access the database files, uses the 1.x API, so no locking from libdb. SA does locking on its own. If you're not using NFS, I'd recommend using lock_method flock, btw. Please, where is this DB that I should back up? It depends what your configuration is. Typically it's ~/.spamassassin/bayes_toks. Otherwise, look at the bayes_path setting. I wrote a GP Berkeley DB rebuilding program that reads all of the key/ value pairs in a DB, and writes all of those for which the key and value are defined and of non-zero length, to a new DB. I could try running that and see if the new DB is significantly smaller than the old, which for my DBs indicates that it's time to use the new DB. This generally isn't needed by SA, since it's is what SA does when an expire happens. You should also look at 'sa-learn --backup' and '--restore'. You could also just use db_dump | db_restore. Theo, do you know if SA uses any entries with null keys or values, that are needed for proper operation? It would be easy to keep entries with null values; I wrote the program to discard them because my DBs don't use such entries. There will be values with null (ascii 0) in them as the token keys are binary values, and the values are binary packed values. This is why sa-learn --backup is a good choice, it will convert the binary into text. -- Randomly Selected Tagline: ... then you'll excuse me, but I'm in the middle of fifteen things, all of them annoying. - Ivonova, Babylon 5 (Midnight on the Firing Line) pgpqqHGuQzBT4.pgp Description: PGP signature
Multiple per user?
Hi, I am using SA from amavis-new, with postfix, in before-queue configuration, with per user scores (provided by amavis) I also want to implement per user bayes. But since SA takes only a single user name, amavis-new is not able to implement per-user bayes. Is it a good idea for SA to take multiple user names for a single mail, and run all checks for multiple users? For example if a mail has 3 RCPT TO: SA will be called with all 3 rcpt and SA will report back with 3 sets of results. I hope this will be more efficient than calling SA 3 times inside procmail or other MDA. This will make life of every one running SA in MTA level very easy. I am sure other people have thought about this before me, and since this is not yet implemented, is this: 1. brain dead 2. require too much work ripping apart SA's guts 3. just needs some one to hack on this, but can be done relatively easy. I can do bit of coding if it's case 3. with regards, raj