Re: Nasty bug? in 3.1.1 headers inserting?
Daryl C. W. O'Shea writes: On 5/9/2006 2:16 PM, Theo Van Dinter wrote: There's some difference of opinion around this question, but my general opinion is that there should be an update to spamass-milter which properly handles the newlines either way. I'm not sure whether or not that's happened yet. As discussed in this SA bug: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4844 this spamass-milter bug has a (confirmed to work) patch that fixes the problem with spamass-milter: http://savannah.nongnu.org/bugs/?func=detailitemitem_id=16164 I do not know if there is an updated spamass-milter release. I'm assuming there isn't since their bug is still open. by the way this is a FAQ, too. http://wiki.apache.org/spamassassin/SaMilter030CorruptMsgs --j.
RE: Nasty bug? in 3.1.1 headers inserting?
Thanks for all of your replies. Think I should have kept a closer eye on the milter. I use DAG WIers packages for RHEL3 and he doesn;t have the 0.3.1 available yet. Never cared to look whether there was an update of the milter and therefor missed the issue. Appologies for any inconveniences on the mailing list. I will compile the milter tonight, as I first have to dig up the source for the sendmail version I'm using. Furthermore I did some digging in RFC822, and this is what I found: 3. LEXICAL ANALYSIS OF MESSAGES 3.1. GENERAL DESCRIPTION A message consists of header fields and, optionally, a body. The body is simply a sequence of lines containing ASCII charac- ters. It is separated from the headers by a null line (i.e., a line with nothing preceding the CRLF). Esto, the \r followed by the \n is against the RFC (Two line feeds is a CRLF on a null line), as it should be followed by a white space (or tab). I don't know exactly if it is spamassassin inserting this sequence or the milter. But if it's spamassassin it should be corrected there I think. If it's the milter it's already been fixed. So in the end the Exchage server is actually adhering the RFC, who would've guessed that. :-) -Sietse From: Justin Mason [mailto:[EMAIL PROTECTED] Sent: Wed 10-May-06 12:03 To: Daryl C. W. O'Shea Cc: users@spamassassin.apache.org Subject: Re: Nasty bug? in 3.1.1 headers inserting? version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on zpm.wizdom.nu X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on zpm.wizdom.nu X-Virus-Status: Clean Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 10 May 2006 10:04:19.0072 (UTC) FILETIME=[1A4A5000:01C67419] Daryl C. W. O'Shea writes: On 5/9/2006 2:16 PM, Theo Van Dinter wrote: There's some difference of opinion around this question, but my general opinion is that there should be an update to spamass-milter which properly handles the newlines either way. I'm not sure whether or not that's happened yet. As discussed in this SA bug: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4844 this spamass-milter bug has a (confirmed to work) patch that fixes the problem with spamass-milter: http://savannah.nongnu.org/bugs/?func=detailitemitem_id=16164 I do not know if there is an updated spamass-milter release. I'm assuming there isn't since their bug is still open. by the way this is a FAQ, too. http://wiki.apache.org/spamassassin/SaMilter030CorruptMsgs --j.
Spamassassin + Kaspersky SMTP-Scanner
Hi List! I'm runing a debian mailserver with qmail 1.03, vpopmail and kaspersky anti-virus smtp-scanner 5.5.3. Now i wanted to add the latest spamassassin to filter the spam which grows up to 500 mails per day. I searched the whole internet for a possible configuration with no result. Has anyone a solution for that ? Thank you! Regards, Thomas Gross
Re: Spamassassin + Kaspersky SMTP-Scanner
On Mittwoch, 10. Mai 2006 13:19 Thomas Gross wrote: Has anyone a solution for that ? I have this for postfix, but not qmail. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpGQtU9aXnwz.pgp Description: PGP signature
Bayes not working
On a new SA installation that's as identical to the other 3 we have running as possible, bayes is not running. spamassassin -D --lint indicates that all is normal. The test message generates a Bayes score. sa-learn is able to talk to the mysql database: We're able to update the database using sa-learn. However, in production, spamassassin does not report any BAYES_ scores. When the spam value exceeds the threshold that would normally cause autolearning, autolearn=no changes to autolearn=unavailable. Similarly, AWL entries are not being created. Can anyone see what's wrong? [3320] dbg: config: read file /usr/share/spamassassin/23_bayes.cf [3320] dbg: bayes: using username: root [3320] dbg: bayes: database connection established [3320] dbg: bayes: found bayes db version 3 [3320] dbg: bayes: Using userid: 1 [3320] dbg: bayes: corpus size: nspam = 178, nham = 168 [3320] dbg: bayes: tok_get_all: token count: 20 [3320] dbg: bayes: score = 0.913557143318889 [3320] dbg: rules: ran eval rule BAYES_80 == got hit [3320] dbg: auto-whitelist: sql-based connected to DBI:mysql:sa_bayes:ccim-mx2 [3320] dbg: auto-whitelist: sql-based finish: disconnected from DBI:mysql:sa_bayes:ccim-mx2 [3320] dbg: check: tests=BAYES_80,MISSING_SUBJECT,NO_REAL_NAME,NO_RECEIVED,NO_RELAYS,TO_CC_NONE # grep -i bayes local.cf # Enable the Bayes system use_bayes 1 # Enable Bayes auto-learning bayes_auto_learn1 bayes_min_ham_num 100 bayes_min_spam_num 100 # bayes_path/var/spool/spamassassin/bayes bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:sa_bayes:ccim-mx2 bayes_sql_username spamass bayes_sql_password xxx bayes_sql_override_username root bayes_auto_expire 0 user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2 # grep -i awl local.cf user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2 user_awl_sql_table awl user_awl_sql_username spamass user_awl_sql_password xxx user_awl_sql_override_username root ]# ps -ef |grep spam root 2170 1 0 07:01 ?00:00:04 /usr/bin/spamd -d -c -m5 -H -r /var/run/spamd.pid root 2247 2170 1 07:01 ?00:00:20 spamd child root 2248 2170 0 07:01 ?00:00:00 spamd child sa-milt 3264 1 0 07:15 pts/000:00:00 /bin/bash /usr/sbin/spamass-milter-wrapper -p /var/run/spamass-milter/spamass-milter.sock -P /var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783 sa-milt 3265 3264 0 07:15 pts/000:00:00 /usr/sbin/spamass-milter -p /var/run/spamass-milter/spamass-milter.sock -P /var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783 SpamAssassin version 3.1.1 running on Perl version 5.8.6 spamass-milter - Version 0.3.1 -- Steve
RE: limit child process
| Spamd calls it, | | But I have seen my monitor , on more than one occasion, with this error, | | swap_pager_getswapspace: failed | | and the worst part is I don't realize it until I hit the KVM switch , and | actually get on the console - | | so can I customize spamd to a lower limit? | | I noticed after I stop /restart spamd my swap goes back to normal spamd -m and what would be an ideal number to set it ? I came in this morning , got a bunch of those swap message , and my VM is at 86% right now
RE: Bayes not working
You will probably get more ideas from posters, but here is my thought. Are you running spamassassin -D --lint as the user that SA runs under when it is running live? For instance, I call SA with the user filter, not the user root. So, to properly test SA I have to first type: su filter This makes me a superuser filter. Now I get a real test of SA when I run spamassassin -D --lint It looks like you may be testing with the user root? SA really should not run live under the root user. Well, that's my idea. Good luck -Original Message- From: Steven Stern [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 10, 2006 8:24 AM To: Spamass Subject: Bayes not working On a new SA installation that's as identical to the other 3 we have running as possible, bayes is not running. spamassassin -D --lint indicates that all is normal. The test message generates a Bayes score. sa-learn is able to talk to the mysql database: We're able to update the database using sa-learn. However, in production, spamassassin does not report any BAYES_ scores. When the spam value exceeds the threshold that would normally cause autolearning, autolearn=no changes to autolearn=unavailable. Similarly, AWL entries are not being created. Can anyone see what's wrong? [3320] dbg: config: read file /usr/share/spamassassin/23_bayes.cf [3320] dbg: bayes: using username: root [3320] dbg: bayes: database connection established [3320] dbg: bayes: found bayes db version 3 [3320] dbg: bayes: Using userid: 1 [3320] dbg: bayes: corpus size: nspam = 178, nham = 168 [3320] dbg: bayes: tok_get_all: token count: 20 [3320] dbg: bayes: score = 0.913557143318889 [3320] dbg: rules: ran eval rule BAYES_80 == got hit [3320] dbg: auto-whitelist: sql-based connected to DBI:mysql:sa_bayes:ccim-mx2 [3320] dbg: auto-whitelist: sql-based finish: disconnected from DBI:mysql:sa_bayes:ccim-mx2 [3320] dbg: check: tests=BAYES_80,MISSING_SUBJECT,NO_REAL_NAME,NO_RECEIVED,NO_RELAYS, TO_CC_NONE # grep -i bayes local.cf # Enable the Bayes system use_bayes 1 # Enable Bayes auto-learning bayes_auto_learn1 bayes_min_ham_num 100 bayes_min_spam_num 100 # bayes_path/var/spool/spamassassin/bayes bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:sa_bayes:ccim-mx2 bayes_sql_username spamass bayes_sql_password xxx bayes_sql_override_username root bayes_auto_expire 0 user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2 # grep -i awl local.cf user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2 user_awl_sql_table awl user_awl_sql_username spamass user_awl_sql_password xxx user_awl_sql_override_username root ]# ps -ef |grep spam root 2170 1 0 07:01 ?00:00:04 /usr/bin/spamd -d -c -m5 -H -r /var/run/spamd.pid root 2247 2170 1 07:01 ?00:00:20 spamd child root 2248 2170 0 07:01 ?00:00:00 spamd child sa-milt 3264 1 0 07:15 pts/000:00:00 /bin/bash /usr/sbin/spamass-milter-wrapper -p /var/run/spamass-milter/spamass-milter.sock -P /var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783 sa-milt 3265 3264 0 07:15 pts/000:00:00 /usr/sbin/spamass-milter -p /var/run/spamass-milter/spamass-milter.sock -P /var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783 SpamAssassin version 3.1.1 running on Perl version 5.8.6 spamass-milter - Version 0.3.1 -- Steve
Re: Spamassassin + Kaspersky SMTP-Scanner
Thomas Gross wrote: Hi List! I'm runing a debian mailserver with qmail 1.03, vpopmail and kaspersky anti-virus smtp-scanner 5.5.3. Now i wanted to add the latest spamassassin to filter the spam which grows up to 500 mails per day. I searched the whole internet for a possible configuration with no result. Has anyone a solution for that ? simscan. You can find it at http://www.inter7.com/?page=simscan Regards, Rick
whitelist_from_rcvd not working
Can someone point out what I am doing wrong hereI have this in my local.cf file: whitelist_from_rcvd [EMAIL PROTECTED] mail*.magnetmail.net But messages are getting blocked that I believe should match this? May 5 14:54:19 esmtp postfix/smtpd[994]: 9315B7FA20: client=mail10.magnetmail.net[209.18.70.10] May 5 14:54:20 esmtp postfix/cleanup[3083]: 9315B7FA20: message-id=[EMAIL PROTECTED] May 5 14:54:36 esmtp postfix/qmgr[39594]: 9315B7FA20: from=, size=55412, nrcpt=1 (queue active) May 5 14:54:47 esmtp amavis[3767]: (03767-02-2) Blocked SPAM, [209.18.70.10] - [EMAIL PROTECTED], quarantine: spam-u95sUSnhhshW.gz, Message-ID: [EMAIL PROTECTED], mail_id: u95sUSnhhshW, Hits: 7.069, 11177 ms May 5 14:54:47 esmtp postfix/smtp[2820]: 9315B7FA20: to=[EMAIL PROTECTED], relay=127.0.0.1[127.0.0.1], delay=28, status=sent (250 2.5.0 Ok, id=03767-02-2, BOUNCE) May 5 14:54:47 esmtp postfix/qmgr[39594]: 9315B7FA20: removed -- Robert
Re: Bayes not working
[3320] dbg: bayes: corpus size: nspam = 178, nham = 168 Probably because your corpus is still too small. man Mail::SpamAssassin::Conf ... bayes_min_ham_num(Default: 200) bayes_min_spam_num (Default: 200) To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings. ... Bye, Andy. -- Finagle's Sixth Law: Don't believe in miracles -- rely on them.
Re: Bayes not working
Andy Spiegl wrote: [3320] dbg: bayes: corpus size: nspam = 178, nham = 168 Probably because your corpus is still too small. man Mail::SpamAssassin::Conf ... bayes_min_ham_num(Default: 200) bayes_min_spam_num (Default: 200) To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings. I imported a corpus of about 2 messages total and it wasn't working. I blew it all away and started from scratch thinking that was the problem. For now, local.cf has a minimum of 100 messages of each type. The current database exceeds that.
Re: whitelist_from_rcvd not working
Robert Fitzpatrick wrote: Can someone point out what I am doing wrong hereI have this in my local.cf file: whitelist_from_rcvd [EMAIL PROTECTED] mail*.magnetmail.net But messages are getting blocked that I believe should match this? What about the below suggests this mail is [EMAIL PROTECTED] The below suggests that the message is from (A bounce), but is being delivered to [EMAIL PROTECTED] May 5 14:54:19 esmtp postfix/smtpd[994]: 9315B7FA20: client=mail10.magnetmail.net[209.18.70.10] May 5 14:54:20 esmtp postfix/cleanup[3083]: 9315B7FA20: message-id=[EMAIL PROTECTED] May 5 14:54:36 esmtp postfix/qmgr[39594]: 9315B7FA20: from=, size=55412, nrcpt=1 (queue active) May 5 14:54:47 esmtp amavis[3767]: (03767-02-2) Blocked SPAM, [209.18.70.10] - [EMAIL PROTECTED], quarantine: spam-u95sUSnhhshW.gz, Message-ID: [EMAIL PROTECTED], mail_id: u95sUSnhhshW, Hits: 7.069, 11177 ms May 5 14:54:47 esmtp postfix/smtp[2820]: 9315B7FA20: to=[EMAIL PROTECTED], relay=127.0.0.1[127.0.0.1], delay=28, status=sent (250 2.5.0 Ok, id=03767-02-2, BOUNCE) May 5 14:54:47 esmtp postfix/qmgr[39594]: 9315B7FA20: removed
RE: limit child process
Title: RE: limit child process -Original Message- From: Jean-Paul Natola [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 10, 2006 8:28 AM To: ; Matt Kettler Cc: users@spamassassin.apache.org Subject: RE: limit child process | Spamd calls it, | | But I have seen my monitor , on more than one occasion, with this error, | | swap_pager_getswapspace: failed | | and the worst part is I don't realize it until I hit the KVM switch , and | actually get on the console - | | so can I customize spamd to a lower limit? | | I noticed after I stop /restart spamd my swap goes back to normal spamd -m and what would be an ideal number to set it ? I came in this morning , got a bunch of those swap message , and my VM is at 86% right now There is no real answer to that. It depends on traffic and ram. Start with 4 and see how it goes. Monitor and adjust. --Chris
RE: Strange Bayes results
Michael Monnerie wrote: On Dienstag, 9. Mai 2006 23:14 Bowie Bailey wrote: When I look at the overall stats, bayes does pretty good: RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 6BAYES_9926754 4.19 44.49 67.003.06 3% HAM hits for BAYES_99 is horrible, not good. It's the FP that should make you alert. True enough. But no complaints so far. I'm not sure how many of my clients are even taking advantage of the spam markup. But when I do it for only our domain (which is where all the manual training happens), it hits less ham, but less spam as well: RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 8BAYES_99 4649 3.29 33.41 54.640.20 At least much better FP rate, by a factor of 15! Just my personal email address (which is trained aggressively) gets very few ham hits (partly because I lowered my threshold to 4.0), but less spam than overall: RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 5 BAYES_99 1643 3.08 27.05 65.720.08 Again the FPs reduced... Of course, it's being constantly trained and the spam threshold is lower. I am curious why I don't get more spam hits with a well-trained database. And then when I modify sa-stats to exclude our domain, I find that our customers (who are trained exclusively with autolearn) seem to do better than us: RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 6 BAYES_9922105 4.44 47.83 70.354.11 No, 4% FPs is nothing you should be happy with. Based on these results, it almost seems like the more training Bayes gets, the worse it does! But remember that sa-stats can never tell if that HAM/SPAM are really such, it just tells you what it *believed* was HAM/SPAM. Right. That's what I was referring to below. Are these anomolies just an artifact of sa-stats relying on SA to judge ham and spam properly? Can these numbers be trusted at all if my users don't reliably report false negatives and positives? As I said on the other thread: Be very careful what you feed to bayes. Try to find those 4% of FPs, and if they are really FPs. Maybe your SA made the mistakes because you don't have enough rules to detect all SPAMs. The group with 4% false positives is trained exclusively through autolearn. There is no facility for manual training with those accounts. If I follow the false positives, it lines up with expectations. The more manual training in the group, the lower the false positives. Why don't I see a similar trend with the spam hits? -- Bowie
RE: Strange Bayes results
Michael Monnerie wrote: On Dienstag, 9. Mai 2006 23:32 Bowie Bailey wrote: And as an additional data point, I found this for one of our internal users who has never done any manual training: RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM 1BAYES_99373 6.76 78.20 95.640.00 1BAYES_00 7320.51 15.300.00 83.91 It at least looks as if he didn't feed wrong messages. Is bayes auto learn set? Yes, this user is set with all the default options for Bayes learning and a spam threshold of 5.0. The entire Bayes database was created via autolearn for this user. It seems to me that Bayes is highly sensitive to the types of ham and spam that each user gets. This user has a near perfect Bayes database created with autolearn. No false positives or negatives and 95% of spam hit by BAYES_99. My account, on the other hand, has a few false positives and only a 66% spam hit rate despite aggressive manual training. -- Bowie
RE: Latest sa-stats from last week
Michael Monnerie wrote: On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote: Hmm... If you are training Bayes, and all of your ham is in English, then what does Bayes do with the Chinese ham your customers get? Nothing. But you won't get a SPAM report from bayes if the e-mail is chinese and you never feed chinese language e-mail. So no FPs. I guess that would work if you simply don't feed Bayes with any foreign language material at all. True, spam is spam. It's the vast differences in ham that I am more worried about. Our customers are salesmen for the most part, so they are constantly sending and receiving marketing type emails. For us, marketing stuff is almost always considered spam. I think this would cause a problem with false positives for our customers if I train Bayes based on our idea of ham and spam. The important thing is that you should *never* feed to bayes something that *could* be a legit e-mail. Most people seem to make that error. I do NOT feed SPAM nor HAM that could be a legit mail. So you are saying that I should not feed Bayes with the unsolicited marketing garbage that I get because it looks like something that could have been requested? Just those nigerian who want to give you some million $ because you are so nice, or those lotteries where you won a lot but before you have to pay, the very good jobs a lot of people seem to offer where you can earn 5000$ for only 3 hours of work and so on. No chance this could be HAM for anybody (with at least some brain, but anyway you have to protect such people from themselves *g*). The same for feeding HAM: Give it only food that *is legit e-mail*, not some which could be. Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong. Wrong for who? If it looks like marketing, 99% of the time, I don't want it. And for most of the accounts that I deal with, this goes up to 100%. Not true for my customers, tho. My philosophy with Bayes has always been to skip the ham/spam definitions and go with a wanted/unwanted model. This way Bayes learns to filter out the emails you don't want even if some of them may technically be ham. (Obviously, I would not be able to do this on a site-wide installation) Another good thing: Since I help with mass-checks, I found that of my 6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn before), as they were mistakes. That's the advantage you get back when running mass-checks. -- Bowie
Re: limit child process
On Mittwoch, 10. Mai 2006 14:27 Jean-Paul Natola wrote: and what would be an ideal number to set it ? How many do you have right now? I came in this morning , got a bunch of those swap message , and my VM is at 86% right now And which processes consume your memory, and how much? mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpYADREsQnUq.pgp Description: PGP signature
RE: Latest sa-stats from last week
jdow wrote: From: Bowie Bailey [EMAIL PROTECTED] Michael Monnerie wrote: On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote: I've got per-user Bayes and most of my users don't bother to train it. Another reason for site-wide bayes, I'd say. I've considered that, but it won't work in our setup. This box scans our internal email as well as all of our customer's email. Since we are in an entirely different line of business from our customers, what we consider to be ham and spam will be quite different from theirs. If I could train it on both sets, it might work, but I don't have access to any of their emails for training. Also, I really prefer a per-user bayes for our internal email since there are various accounts that get a specific type of ham and work very well with Bayes. Importune on them to feed you as large a collection of ham and spam as they can, once. Then turn on autolearn, cross your fingers, and put on your flack jacket. What flack jacket? I have Bayes turned on now and I never did any manual training on most of the accounts. I just turned it on and let autolearn (with the default settings) do it's thing. So far, I have received very few complaints. But then again, I think less than half of my users are even taking advantage of the spam markup. Since I don't do any blocking or sorting on the server, it is up to them to use MUA rules to sort or delete the spam once my server has marked it. -- Bowie
Re: Strange Bayes results
On Mittwoch, 10. Mai 2006 17:08 Bowie Bailey wrote: Yes, this user is set with all the default options for Bayes learning and a spam threshold of 5.0. The entire Bayes database was created via autolearn for this user. Is that possible at all? I though that bayes to work you need 200 ham + 200 spam first. It seems to me that Bayes is highly sensitive to the types of ham and spam that each user gets. This user has a near perfect Bayes database created with autolearn. No false positives or negatives and 95% of spam hit by BAYES_99. My account, on the other hand, has a few false positives and only a 66% spam hit rate despite aggressive manual training. I had on offlist discussion with somebody, we tried to compare our setup and results. I'll post this as a separate thread tonight or tomorrow, I've gotta go now. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpIOn4NkgBU5.pgp Description: PGP signature
Re: Bayes not working
On Mittwoch, 10. Mai 2006 16:01 Steven Stern wrote: I imported a corpus of about 2 messages total and it wasn't working. I blew it all away and started from scratch thinking that was the problem. For now, local.cf has a minimum of 100 messages of each type. The current database exceeds that. I've had such an issue. In ancient times I had done sudo -H -u spamscanner sa-learn , but that doesn't work now. I really have to do su -l spamscanner and then sa-learn. Maybe that's your problem. Try to sa-learn --dump magic|grep token to see how many ham/spam there really are - as that user. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgppibO7XSozC.pgp Description: PGP signature
Re: My only problem with URIBL_BLACK
jdow wrote: From: Matt Kettler [EMAIL PROTECTED] Let's look at their IPs they are hosting their domain from: $ host uhmcargo*MUNGED*.com snip Fascinating - even the whois registration seems to have MPD, er Multiple Personality Disorder. This is what I got in part: ===8--- Registrant: Amber Furlong [EMAIL PROTECTED] +1.6785283829 Private person 20222 shadowood parkway Atlanta,GA,UNITED STATES 30339 Domain Name:uhmcargo.net-M Yeah, I screwed up and use .com instead of .net. When I query the .net I get the same results as you.
Re: limit child process
Jean-Paul Natola wrote: spamd -m and what would be an ideal number to set it ? I came in this morning , got a bunch of those swap message , and my VM is at 86% right now As Chris S already said, there's no hard-fast rule here. However, here's a rule of thumb to start with: 1) Use ps aux or top to find the RSS of your largest spamd instance. This will likely be somewhere around 30M, unless you're using some really large add-on sets. If your answer here is over 60M, see my footnotes on reducing memory use. 2) Add an extra 4M to this, to cover extra storage for data. If you're passing -s to spamc, use 16 times the parameter (default is 250k, *16 = 4M). I'm going to pretend my total is 34M. (Yes, I know 16* is generous, but this is a rule-of thumb here) 3) Find out how much free memory you have without spamd running. If you use linux I'd suggest running free and look at the free column next to -/+ buffers/cache:. I'll pretend we have 512M here. 4) Divide the free memory by your answer from 2. That should give you a good rough-estimate number to work with. Footnote on memory usage: If your spamd instances are huge, review the add-on rulesets you're using. Be warry of any add-on rule file that is over 128k in size. In particular, do NOT use sa-blacklist unless you have tons of ram to spare. This ruleset is nearly 2m in .cf file format and will massively expand your SA's memory usage.
RE: Strange Bayes results
Michael Monnerie wrote: On Mittwoch, 10. Mai 2006 17:08 Bowie Bailey wrote: Yes, this user is set with all the default options for Bayes learning and a spam threshold of 5.0. The entire Bayes database was created via autolearn for this user. Is that possible at all? I though that bayes to work you need 200 ham + 200 spam first. Sure it is. Bayes will autolearn messages right from the start. It just waits until it has seen 200 ham and 200 spam before it starts contributing to the score. There is nothing saying that you have to manually learn the first group of messages. On the other hand, since there is very little direct feedback from that initial set of messages, you have to be careful that false positives and negatives do not corrupt the database before you even get started. It seems to me that Bayes is highly sensitive to the types of ham and spam that each user gets. This user has a near perfect Bayes database created with autolearn. No false positives or negatives and 95% of spam hit by BAYES_99. My account, on the other hand, has a few false positives and only a 66% spam hit rate despite aggressive manual training. I had on offlist discussion with somebody, we tried to compare our setup and results. I'll post this as a separate thread tonight or tomorrow, I've gotta go now. Sounds interesting. -- Bowie
RE: My only problem with URIBL_BLACK
Title: RE: My only problem with URIBL_BLACK On a side note, to anyone watching this seemingly incredible long discusion about one FP: This is typically what URIBL member do. We take every FP and delist request seriously. We do deep research on each one. Much deeper then anything you have seen here in this thread. Its not the first time someone has told us about an FP that has turned out to be false. Won't be the last. We've had spammers request delistings, which of course sets our magic elves into a firey rage or research. This only backfires on the spammers, and not only doesn't get his spam domain delisted, but gets a lot more of them found in research listed. A lot of people on other spam lists have said how Soul Grinding running an RBL is. Well we can now attest to that fact. Threads like this happen in private very often. Lots of work. One can often do hours of research to add 100+ domains, only to find another member has already done it! Bastards! :) All of this would not be possible without some very incredible people. I can't thank the members of URIBL enough. The people who support us with mirrors. The anonymous non-members who email us privately with lots of helpful info. Hosts for the bandwidth. Jeff Chan and W.Stearns, for that very first conference call. The SA devs for putting up with us,ok, me. And of course.the magic elves. Thanks to all. (Might as well add, all of the above also goes for the incredible work of the SARE team!) --Chris (Holy crap! Did I just post a serious messege to the list? WTF is wrong with me?) (Double holy crap! I said something nice about Jeff again! He won't believe it!)
Upgrade issues
Hi all, I upgrade from 2.63 to 3.1 a few weeks ago and it's running fine but I can seem to figure out how to get something working again. I did RTFM but I'm still at a loss, I'm looking to get my header reports back in. below is what I have in my local.cf # This is the right place to customize your installation of SpamAssassin. # See 'perldoc Mail::SpamAssassin::Conf' for details of what can be # tweaked. # ### # #defang_mime 0 lock_method flock always_add_report 0 #report_header 1 #always_add_headers 1 add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ use_terse_report0 rewrite_subject 0 report_safe 0 required_hits 7.5 auto_whitelist_path /whitelist/auto-whitelist auto_whitelist_file_mode666 auto_whitelist_factor 0.5 I know there are some old things still here but this is all I get in the headers Processed in 3.636165 secs); 02 May 2006 09:58:55 - X-Spam-Status: Yes, hits=8.2 required=7.5 No report on what tests it hit on. What I would like to see is the old terse report style headers TIA Jason
Re: My only problem with URIBL_BLACK
RE: My only problem with URIBL_BLACK|On a side note, to anyone watching this seemingly incredible long discusion about one FP: |This is typically what URIBL member do. We take every FP and delist request seriously. We do deep research on |each one. Much deeper then anything you have seen here in this thread. Its not the first time someone has told us |about an FP that has turned out to be false. Won't be the last. |We've had spammers request delistings, which of course sets our magic elves into a firey rage or research. This |only backfires on the spammers, and not only doesn't get his spam domain delisted, but gets a lot more of them |found in research listed. |A lot of people on other spam lists have said how Soul Grinding running an RBL is. Well we can now attest to that |fact. Threads like this happen in private very often. Lots of work. One can often do hours of research to add 100+ |domains, only to find another member has already done it! Bastards! :) |All of this would not be possible without some very incredible people. I can't thank the members of URIBL enough. |The people who support us with mirrors. The anonymous non-members who email us privately with lots of helpful |info. Hosts for the bandwidth. Jeff Chan and W.Stearns, for that very first conference call. The SA devs for putting up |with us,ok, me. And of course.the magic elves. Thanks to all. |(Might as well add, all of the above also goes for the incredible work of the SARE team!) |--Chris Chris, I brought the issue up as I had a few messages of what my customers believed were FP's. I only posted 2 examples but there are many. In my case, I have 1 out of 1000's how will want the mailing. I think what I got out of this whole discussion was that I need to implement per user whitelisting. I will be working on that this weekend. I support URIBL 100%. In fact, if you check, you will see that I am a mirror and have made donations for the cause in the past ;-)
RE: Upgrade issues
Jason Staudenmayer wrote: I upgrade from 2.63 to 3.1 a few weeks ago and it's running fine but I can seem to figure out how to get something working again. I did RTFM but I'm still at a loss, I'm looking to get my header reports back in. below is what I have in my local.cf # This is the right place to customize your installation of SpamAssassin. # See 'perldoc Mail::SpamAssassin::Conf' for details of what can be # tweaked. # ### # #defang_mime 0 lock_method flock always_add_report 0 #report_header 1 #always_add_headers 1 add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ use_terse_report0 rewrite_subject 0 report_safe 0 required_hits 7.5 auto_whitelist_path /whitelist/auto-whitelist auto_whitelist_file_mode666 auto_whitelist_factor 0.5 I know there are some old things still here but this is all I get in the headers Processed in 3.636165 secs); 02 May 2006 09:58:55 - X-Spam-Status: Yes, hits=8.2 required=7.5 No report on what tests it hit on. What I would like to see is the old terse report style headers First, some generic advice. Run 'spamassassin --lint' and fix any errors that it finds. As for the headers, it looks like it is giving you what you asked for with your 'add_header' setting. It looks like it is really on two lines in your local.cf since what you got was just the first line of the status. Either put the entire 'add_header' definition on one line, or just remove it to get the default headers. -- Bowie
Big Idiot Needs Instructions
Hola, I have spent two days trying to figure out how to get the following to work. I have set up Spamassassin and ClamAV, I am running sendmail on the Solaris 10 platform. I would like to be able to scan for all spam and virus (in, out and relayed email). Can someone please point me in the right direction? Do I use procmail or something else. I set this particular combination up years ago on a Linux box but I have had a lot of gigo since then. Thanks for any help --- Chris Edwards
Re: Big Idiot Needs Instructions
I have spent two days trying to figure out how to get the following to work. I have set up Spamassassin and ClamAV, I am running sendmail on the Solaris 10 platform. I would like to be able to scan for all spam and virus (in, out and relayed email). Can someone please point me in the right direction? Do I use procmail or something else. I set this particular combination up years ago on a Linux box but I have had a lot of gigo since then. Both SA and ClamAV can run from milters - ClamAV comes with its own, but you'd have to Google for the SA one (or check the FAQ; I don't know if it's listed in there). That would scan the messages before they come in the door, and you can reject accordingly at the MTA level. You can also use procmail; for SA you'd likely want to run spamd, then invoke spamc from procmail (either system-wide or on a per-user basis, your call). I use this script to pass messages to clamscan (or clamdscan) via procmail: http://www.virtualblueness.net/~blueness/clamscan-procfilter/clamscan-procfilter.pl You can bounce messages that were detected via procmail, but it's a bad idea.
RE: Big Idiot Needs Instructions
Mike Jackson wrote: I have set up Spamassassin and ClamAV, I am running sendmail on the Solaris 10 platform. I would like to be able to scan for all spam and virus (in, out and relayed email). Both SA and ClamAV can run from milters - ClamAV comes with its own, but you'd have to Google for the SA one Alternately you could install MIMEDefang, which is a milter that calls ClamAV and SpamAssassin directly. -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
Re: Bayes not working
Michael Monnerie wrote: On Mittwoch, 10. Mai 2006 16:01 Steven Stern wrote: I imported a corpus of about 2 messages total and it wasn't working. I blew it all away and started from scratch thinking that was the problem. For now, local.cf has a minimum of 100 messages of each type. The current database exceeds that. I've had such an issue. In ancient times I had done sudo -H -u spamscanner sa-learn , but that doesn't work now. I really have to do su -l spamscanner and then sa-learn. Maybe that's your problem. Try to sa-learn --dump magic|grep token to see how many ham/spam there really are - as that user. Everything's tweaked to use root as the user. We do sitewide processing since this sits on an MX server.
Re: Bayes not working
Steven Stern wrote: Everything's tweaked to use root as the user. We do sitewide processing since this sits on an MX server. Do you use spamd? If so, it WILL NOT use root as the user. Ever. Period.
RE: Big Idiot Needs Instructions
Thanks for all the quick replies! I was able to get both the mileters up and running. Now I have one new question... When I run spamd... /usr/local/bin/spamd -d -u nobody I get these errors... [24001] warn: unix dgram connect: Socket operation on non-socket at /usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79 [24001] error: no connection to syslog available at /usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79 Any ideas? Thanks Again --- Chris Edwards -Original Message- From: Mike Jackson [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 10, 2006 12:41 PM To: users@spamassassin.apache.org Subject: Re: Big Idiot Needs Instructions I have spent two days trying to figure out how to get the following to work. I have set up Spamassassin and ClamAV, I am running sendmail on the Solaris 10 platform. I would like to be able to scan for all spam and virus (in, out and relayed email). Can someone please point me in the right direction? Do I use procmail or something else. I set this particular combination up years ago on a Linux box but I have had a lot of gigo since then. Both SA and ClamAV can run from milters - ClamAV comes with its own, but you'd have to Google for the SA one (or check the FAQ; I don't know if it's listed in there). That would scan the messages before they come in the door, and you can reject accordingly at the MTA level. You can also use procmail; for SA you'd likely want to run spamd, then invoke spamc from procmail (either system-wide or on a per-user basis, your call). I use this script to pass messages to clamscan (or clamdscan) via procmail: http://www.virtualblueness.net/~blueness/clamscan-procfilter/clamscan-pr ocfilter.pl You can bounce messages that were detected via procmail, but it's a bad idea.
RE: Big Idiot Needs Instructions
Chris Edwards wrote: /usr/local/bin/spamd -d -u nobody I get these errors... [24001] warn: unix dgram connect: Socket operation on non-socket at /usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79 [24001] error: no connection to syslog available at /usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79 Add --syslog-socket=inet to the spamd startup line Source: http://lists.roaringpenguin.com/pipermail/mimedefang/2004-April/021539.html
seeing a lot of these?
EMPTY_MESSAGE 1.50, MISSING_HEADERS 0.19, MISSING_SUBJECT 1.34, MSGID_FROM_MTA_HEADER 0.00, MSGID_FROM_MTA_ID 0.93, NO_REAL_NAME 0.55, TO_CC_NONE 0.13, UNCLOSED_BRACKET 2.48 ? We are seeing many, many hundreds of these very small messages, with NO Subject: or To: headers and NO Message Body. I assume it's a spam trojan run amok? Currently, I just added a META rule to deal with them appropriately. Anyone else seeing these? Ken A Pacific.Net
spam getting autolearn=ham problem
more and more i am seeing spam marked as autolearn=hamI was wondering the best way to correct this? I was going to delete the bayes and whitelist files and start over but I thought I would see what you do when this happens. my setupusing fc4sendmailspamass-milter - one bayes file for all users on serverspamassassin -chris
RE: seeing a lot of these?
Ken A wrote: Bowie Bailey wrote: Ken A wrote: EMPTY_MESSAGE 1.50, MISSING_HEADERS 0.19, MISSING_SUBJECT 1.34, MSGID_FROM_MTA_HEADER 0.00, MSGID_FROM_MTA_ID 0.93, NO_REAL_NAME 0.55, TO_CC_NONE 0.13, UNCLOSED_BRACKET 2.48 ? We are seeing many, many hundreds of these very small messages, with NO Subject: or To: headers and NO Message Body. I assume it's a spam trojan run amok? Currently, I just added a META rule to deal with them appropriately. Anyone else seeing these? I see these from time to time. I haven't tried to catch them with SA, but since they tend to cause problems with Outlook's pop3 routines, I have a cronjob that sweeps the maildirs and removes them every 15 minutes. I've seen issues with OE and these messages too, which is why I setup the Meta rule. Do you know if this is a documented issue with OE? I don't know if this issue is documented or not. I'm not sure MS would consider it a bug anyway since it is caused by a severely malformed email. I think the problem is due to there not being a blank line in the email to mark the end of the headers. -- Bowie
Problem with spamassassin skipping messages and sa-learn coredumps on sync
Greetings all, I am having a problem where spamassassin is not running on all my messages and a bunch of spam is slipping into my inbox as a result. I would say this has been going on for about a month, but it took me a few weeks to notice the lack of spamassassin headers in the skipped mail. I am not having any luck figuring out why this happening or how to fix it; I hope someone on this list can help. I'm running version 3.1.0 (Perl 5.8.7) and the platform it's running on is NetBSD 3.0. I invoke it via procmail, with the recipe suggested at the spamassassin wiki. I turned on procmail logging this morning and I am seeing a number of potentially troubling messages. For example, I've got a couple of thes failure messages: procmail: Match on 256000 procmail: Locking spamassassin.lock procmail: Executing spamassassin procmail: [7638] Wed May 10 10:29:40 2006 procmail: Program failure (-11) of spamassassin procmail: Rescue of unfiltered data succeeded procmail: [7638] Wed May 10 10:29:40 2006 procmail: Unlocking spamassassin.lock And I see several of these: procmail: [2585] Wed May 10 10:59:53 2006 procmail: Locking spamassassin.lock [8699] warn: bayes: unknown packing format for bayes db, please re-learn: 138 at /usr/pkg/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm line 1874. procmail: [2585] Wed May 10 11:00:01 2006 procmail: Locking spamassassin.lock [8699] warn: bayes: unknown packing format for bayes db, please re-learn: 68 at /usr/pkg/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm line 1874. [8699] warn: bayes: expire_old_tokens: panic: sv_setpvn called with negative strlen at /usr/pkg/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm line 624. I notice the warnings mention re-learning. Unfortunately I am also having problems with sa-learn --sync dumping core on most of my attempts to train the classifier. Does anyone have any idea what causes these errors and how I might fix them? Thanks, marijane
Re: spam getting autolearn=ham problem
Bazooka Joe wrote: more and more i am seeing spam marked as autolearn=ham I was wondering the best way to correct this? Depends.. Really you first need to figure out why it this happened before you take any action at all. Can you post a X-Spam-Status header for one of the messages? Have you modified the required_score, or any of the learning thresholds in your config? In general there are only a few rules that can cause a message to be tagged as spam, but do not count toward the computation of score for learning purposes. *_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones.
RE: spam getting autolearn=ham problem
Bazooka Joe wrote: more and more i am seeing spam marked as autolearn=ham This means that the spam is being given a very low score from SA. The score used here does not include the Bayes scoring, but if you learn very many like this, then the effectiveness of your Bayes database will drop. I was wondering the best way to correct this? I was going to delete the bayes and whitelist files and start over but I thought I would see what you do when this happens. It might be useful to start over with these files, but you need to fix the underlying problem first. Why are these messages scoring so low? What kind of messages are they? If you don't fix the scoring problem, your new files will just inherit the same problems as the old ones. my setup using fc4 sendmail spamass-milter - one bayes file for all users on server spamassassin -- Bowie
Re: spam getting autolearn=ham problem
X-Spam-Status: No, score=1.0 required=3.0 tests=BAYES_60 autolearn=ham version=3.0.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on agwebinc.com I have required of 3 which you can see and i have the milter rejecting email w/ score more than 7On 5/10/06, Matt Kettler [EMAIL PROTECTED] wrote:Bazooka Joe wrote: more and more i am seeing spam marked as autolearn=ham I was wondering the best way to correct this?Depends.. Really you first need to figure out why it this happened before youtake any action at all. Can you post a X-Spam-Status header for one of the messages?Have you modified the required_score, or any of the learning thresholds in yourconfig?In general there are only a few rules that can cause a message to be tagged as spam, but do not count toward the computation of score for learning purposes.*_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones.
Re: spam getting autolearn=ham problem
Bazooka Joe wrote: *X-Spam-Status:* No, score=1.0 required=3.0 tests=BAYES_60 autolearn=ham version=3.0.4 *X-Spam-Level:* * *X-Spam-Checker-Version:* SpamAssassin 3.0.4 (2005-06-05) on agwebinc.com http://agwebinc.com As far as the autolearner is concerned, the score of that message is 0. (BAYES_60 is the only rule matched, and the autolearner doesn't consider BAYES rule scores to prevent self-feedback in the bayes learning). 0 is less than the default ham learning threshold of 0.1, and the existing training only scores 60 (not strongly known as spam), so it autolearns it as ham. I would approach this from two angles. 1) why did the spam message fail to match any rules other than bayes? Your SA version is a little old, you might consider testing it against 3.1.1. You might also consider some rulesemporium.com add-on rulesets to help detect the particular spam message. 2) Why did it only rank as BAYES_60. Have you done any manual training?
Re: spam getting autolearn=ham problem
Bazooka Joe wrote: X-Spam-Status: No, score=1.0 required=3.0 tests=BAYES_60 autolearn=ham version=3.0.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on agwebinc.com I have required of 3 which you can see and i have the milter rejecting email w/ score more than 7 On 5/10/06, Matt Kettler [EMAIL PROTECTED] wrote: Bazooka Joe wrote: more and more i am seeing spam marked as autolearn=ham I was wondering the best way to correct this? Depends.. Really you first need to figure out why it this happened before you take any action at all. Can you post a X-Spam-Status header for one of the messages? Have you modified the required_score, or any of the learning thresholds in your config? In general there are only a few rules that can cause a message to be tagged as spam, but do not count toward the computation of score for learning purposes. *_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones. You can set bayes_auto_learn_threshold_nonspam in local.cf to be 0 or a negative number, then autolearn=ham won't kick in unless it's below a certain score (not sure if this counts bayes or not). But yes, the real question is why are no rules triggering... Is DNS working? Are you using the blacklist rules, etc? What does the spam look like? Jay
OT: anyone know how to do server-side MS-Exchange filters?
We currently have to rely on users to create Rules Wizard settings under Outlook to filter off their Spam (via X-Spam-Status: headers). What would be better is if Exchange could do like procmail/maildrop and allow the SysAdmin to create the rule on the server - so it hits everyone. It should create a Spam folder per mailbox and deliver high-scoring spam to that instead of the INBOX Has anyone done this, and if so, what sort of tools allow it? Thanks! (2006, and still waiting for Microsoft to do what was done 15 years ago on older systems...) -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Re: spam getting autolearn=ham problem
the spamHi RobarIt is sad but it is true that the large groups of women are unhappy withthe size of there BF is thing. Don't be that guy,www.missusoandforever.org/ab1/ . and station, designed been grabbed theorized to artistThank youi run rulesTRUSTED_RULESETS=SARE_STOCKS TRIPWIRE SARE_EVILNUMBERS0 SARE_EVILNUMBERS1 BOGUSVIRUS SARE_ADULT SARE_FRAUD SARE_BML SARE_SPOOF SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM SARE_HEA DER SARE_HTML SARE_SPECIFIC SARE_OBFU SARE_REDIRECT SARE_GENLSUBJ SARE_UNSUB SARE_WHITELIST;on my account I get about 10 spams a day scoring below a 3 out of 50 spams total (thats a guess)I will try moving the ham threshold down. and no I haven't done any bayes training. and dns is working.some stats for my box for one weekI block using sbl-xbl.spamhaus.org, or spamassass catches, or clamav rejects about 45,000 emails. ham email w/ a score of 3 or less is about 9,000 On 5/10/06, Jay Lee [EMAIL PROTECTED] wrote: Bazooka Joe wrote: X-Spam-Status: No, score=1.0 required=3.0 tests=BAYES_60 autolearn=ham version=3.0.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on agwebinc.com I have required of 3 which you can see and i have the milter rejecting email w/ score more than 7 On 5/10/06, Matt Kettler [EMAIL PROTECTED] wrote: Bazooka Joe wrote: more and more i am seeing spam marked as autolearn=ham I was wondering the best way to correct this? Depends.. Really you first need to figure out why it this happened before you take any action at all. Can you post a X-Spam-Status header for one of the messages? Have you modified the required_score, or any of the learning thresholds in your config? In general there are only a few rules that can cause a message to be tagged as spam, but do not count toward the computation of score for learning purposes. *_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones. You can set bayes_auto_learn_threshold_nonspam in local.cf to be 0 or a negative number, then autolearn=ham won't kick in unless it's below a certain score (not sure if this counts bayes or not). But yes, the real question is why are no rules triggering... Is DNS working? Are you using the blacklist rules, etc? What does the spam look like? Jay
Re: OT: anyone know how to do server-side MS-Exchange filters?
On Thu, 11 May 2006, Jason Haar wrote: Has anyone done this, and if so, what sort of tools allow it? A Linux mail relay in front of the Exchange server. :) -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The problem is when people look at Yahoo, slashdot, or groklaw and jump from obvious and correct observations like Oh my God, this place is teeming with utter morons to incorrect conclusions like there's nothing of value here.-- Al Petrofsky, in Y! SCOX ---
The New SpamAssassin sa-update
1. Depends on some PERL packages: libarchive-tar-perl, libio-zlib-perl. Debian packages did not enforce this dependency. I installed them manually. 2. Fails after several time outs with: http: request failed: 500 read timeout: 500 read timeout error: no mirror data available for channel updates.spamassassin.org channel: MIRRORED.BY contents were missing, channel failed EIther the rules update site is not ready or this PERL script needs some configuration. EIther way, not ready to play.
Re: {SPAM}{!} Re: spam getting autolearn=ham problem
Bazooka Joe wrote: the spam snip and no I haven't done any bayes training. and dns is working. Are you running with SpamAssassin's built-in support for RBLs and URIBLs? That message text got *TORN UP* by the URIBLs on my system: X-EVI-MailScanner-SpamCheck: spam, SpamAssassin (score=15.259, required 5, HTML_40_50 0.50, HTML_MESSAGE 0.00, INFO_GREYLIST_NOTDELAYED -0.00, LOCAL_FORGED_REFERENCES 0.10, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.50, RAZOR2_CHECK 0.50, SPF_PASS -0.00, SURBL_MULTI1 -0.50, SURBL_MULTI2 -0.20, URIBL_BLACK 1.50, URIBL_BLACK_OVERLAP -1.00, URIBL_JP_SURBL 4.09, URIBL_SBL 1.64, URIBL_SC_SURBL 4.50, URIBL_WS_SURBL 2.14) Check your init.pre and see if the uribl plugin is loaded, also check to make sure you have Net::DNS installed.
Re: The New SpamAssassin sa-update
David Baron wrote: 1. Depends on some PERL packages: libarchive-tar-perl, libio-zlib-perl. Debian packages did not enforce this dependency. I installed them manually. 2. Fails after several time outs with: http: request failed: 500 read timeout: 500 read timeout error: no mirror data available for channel updates.spamassassin.org channel: MIRRORED.BY contents were missing, channel failed EIther the rules update site is not ready or this PERL script needs some configuration. EIther way, not ready to play. Is the debian package SA 3.1.0 or 3.1.1? If 3.1.0, known issue. sa-update was fixed in 3.1.1.
Re: spam getting autolearn=ham problem
The message you sent directly to me hit the following: * 0.5 HTML_40_50 BODY: Message is 40% to 50% HTML * 0.1 HTML_MESSAGE BODY: HTML included in message * 1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level * above 50% * [cf: 100] * 0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) * 3.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 100] * 10 URIBL_SBL Contains an URL listed in the SBL blocklist * [URIs: missusoandforever.org] * 4.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist * [URIs: missusoandforever.org] Of course, the scores are heavily inflated by my own personal rules (I don't recommend doing this unless you know what you're doing) but the point is, your SA doesn't seem to be firing on certain things it should, do you have the DNS BL's working? Are you using Razor or DCC? Are you on the latest 3.1.1? Jay
Re: Latest sa-stats from last week
On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote: So you are saying that I should not feed Bayes with the unsolicited marketing garbage that I get because it looks like something that could have been requested? If it's a newsletter from a seemingly legit company I don't feed it to bayes. I try to unsubscribe from them. If they still send me, I write some rule to filter them. If some customer then rants, I tell them that said company doesn't work nicely - and he should make a filter to get e-mail from that company out of the SPAM folder again. Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong. Wrong for who? If it looks like marketing, 99% of the time, I don't want it. And for most of the accounts that I deal with, this goes up to 100%. Not true for my customers, tho. Yes, some manual filters can catch those. If it's stupid SPAM, then bayes. My philosophy with Bayes has always been to skip the ham/spam definitions and go with a wanted/unwanted model. This way Bayes learns to filter out the emails you don't want even if some of them may technically be ham. (Obviously, I would not be able to do this on a site-wide installation) But as you said your bayes is not quite accurate, so it seems not to work really. Wouldn't it be better to have a highly accurate bayes, and setup some filters for you personally? If a BAYES_99 would be always SPAM for you, you could give it 4.5 or 5 points, and probably filter more SPAM than now? But then again, I think less than half of my users are even taking advantage of the spam markup. Since I don't do any blocking or sorting on the server, it is up to them to use MUA rules to sort or delete the spam once my server has marked it. I do the same, just wrote a nice document for Outlook 2003 describing how to filter SPAM. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpzgGFS0Slt9.pgp Description: PGP signature
Re: OT: anyone know how to do server-side MS-Exchange filters?
John D. Hardin wrote: On Thu, 11 May 2006, Jason Haar wrote: Has anyone done this, and if so, what sort of tools allow it? A Linux mail relay in front of the Exchange server. :) That wouldn't allow messages to be put in a subfolder instead of inbox, just to do the header tagging. Not having used Exchange I can't answer intellegently on whether or not it supports server side sorting. However, if it doesn't you could use something like Maia Mailguard and a Postfix frontend to the exchange server to quarantine and report the spam, users would be able to configure and safely view and "free" tagged spam messages via a web interface. It also can send regular reports to the users on what spam they've gotten, senders and subject, etc. Website is: http://www.renaissoft.com/maia/
RE: Latest sa-stats from last week
Michael Monnerie wrote: On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote: So you are saying that I should not feed Bayes with the unsolicited marketing garbage that I get because it looks like something that could have been requested? If it's a newsletter from a seemingly legit company I don't feed it to bayes. I try to unsubscribe from them. If they still send me, I write some rule to filter them. If some customer then rants, I tell them that said company doesn't work nicely - and he should make a filter to get e-mail from that company out of the SPAM folder again. If it comes to an account that does not subscribe to newsletters (webmaster, sales, etc), it is spam by definition and is fed to Bayes. Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong. Wrong for who? If it looks like marketing, 99% of the time, I don't want it. And for most of the accounts that I deal with, this goes up to 100%. Not true for my customers, tho. Yes, some manual filters can catch those. If it's stupid SPAM, then bayes. My philosophy with Bayes has always been to skip the ham/spam definitions and go with a wanted/unwanted model. This way Bayes learns to filter out the emails you don't want even if some of them may technically be ham. (Obviously, I would not be able to do this on a site-wide installation) But as you said your bayes is not quite accurate, so it seems not to work really. Wouldn't it be better to have a highly accurate bayes, and setup some filters for you personally? If a BAYES_99 would be always SPAM for you, you could give it 4.5 or 5 points, and probably filter more SPAM than now? If I look at my personal database, the spam percentage shown in the stats is lower than I'd like, but I wouldn't say it's not accurate. I very rarely see a true false positive or negative with Bayes and I watch my account closely. I do see a few ham with BAYES_99 and spam with BAYES_00, but that's usually simply because those were either spam that only hit BAYES_99 or ham (usually from this list) that tripped a few extra rules. But then again, I think less than half of my users are even taking advantage of the spam markup. Since I don't do any blocking or sorting on the server, it is up to them to use MUA rules to sort or delete the spam once my server has marked it. I do the same, just wrote a nice document for Outlook 2003 describing how to filter SPAM. I've done the same for both Outlook Express and Thunderbird. The Thunderbird setup is a single checkbox. :) -- Bowie
ALL_TRUSTED causing false negatives?
I've been getting a lot of spam lately ever since I moved my mail server to a new system. Here's one of the false negatives that slipped through, for example: X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,BAYES_50, NO_REAL_NAME,RCVD_BY_IP,YOUR_INCOME autolearn=ham version=3.0.3 X-Spam-Summary: 0.0 NO_REAL_NAME From: does not include a real name 0.1 RCVD_BY_IP Received by mail server with no name -3.3 ALL_TRUSTEDDid not pass through any untrusted hosts 1.1 YOUR_INCOMEBODY: Doing something with my income 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] Why does ALL_TRUSTED have a score of -3.3? Doesn't this mean that any spammer who connects directly to my mail server has a good chance of getting past SpamAssassin? I did not define any trusted/internal networks when I installed SpamAssassin. SpamAssassin version 3.0.3 running on Perl version 5.8.4 Linux naga.aaanime.net 2.6.8-11-amd64-k8 #1 Sun Oct 2 21:26:54 UTC 2005 x86_64 GNU/Linux Running Debian Sarge
Re: ALL_TRUSTED causing false negatives?
Philip Mak wrote: I've been getting a lot of spam lately ever since I moved my mail server to a new system. Here's one of the false negatives that slipped through, for example: X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,BAYES_50, NO_REAL_NAME,RCVD_BY_IP,YOUR_INCOME autolearn=ham version=3.0.3 X-Spam-Summary: 0.0 NO_REAL_NAME From: does not include a real name 0.1 RCVD_BY_IP Received by mail server with no name -3.3 ALL_TRUSTEDDid not pass through any untrusted hosts 1.1 YOUR_INCOMEBODY: Doing something with my income 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] Why does ALL_TRUSTED have a score of -3.3? Doesn't this mean that any spammer who connects directly to my mail server has a good chance of getting past SpamAssassin? That should not happen on a properly working SA setup. Odds are very good you've got a NATed mailserver, resulting in the Trust Path gueser to fail. You'll have to declare trusted_networks manually to fix it. http://wiki.apache.org/spamassassin/TrustPath
Re: ALL_TRUSTED causing false negatives?
From: Philip Mak [EMAIL PROTECTED] I've been getting a lot of spam lately ever since I moved my mail server to a new system. Here's one of the false negatives that slipped through, for example: X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,BAYES_50, NO_REAL_NAME,RCVD_BY_IP,YOUR_INCOME autolearn=ham version=3.0.3 X-Spam-Summary: 0.0 NO_REAL_NAME From: does not include a real name 0.1 RCVD_BY_IP Received by mail server with no name -3.3 ALL_TRUSTEDDid not pass through any untrusted hosts 1.1 YOUR_INCOMEBODY: Doing something with my income 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] Why does ALL_TRUSTED have a score of -3.3? Doesn't this mean that any spammer who connects directly to my mail server has a good chance of getting past SpamAssassin? I did not define any trusted/internal networks when I installed SpamAssassin. SpamAssassin version 3.0.3 running on Perl version 5.8.4 Linux naga.aaanime.net 2.6.8-11-amd64-k8 #1 Sun Oct 2 21:26:54 UTC 2005 x86_64 GNU/Linux Running Debian Sarge There is a strong indication that you have your trusted networks maldefined. I suggest visiting the wiki and looking up ALL_TRUSTED. {^_^}
Re: Big Idiot Needs Instructions
From: Chris Edwards [EMAIL PROTECTED] Hola, I have spent two days trying to figure out how to get the following to work. I have set up Spamassassin and ClamAV, I am running sendmail on the Solaris 10 platform. I would like to be able to scan for all spam and virus (in, out and relayed email). Can someone please point me in the right direction? Do I use procmail or something else. I set this particular combination up years ago on a Linux box but I have had a lot of gigo since then. Thanks for any help jdow I use procmail with great success. I also use the SpamAssassin ClamAV plugin. (See plugins on the wiki.) {^_^}
Re: Latest sa-stats from last week
From: Bowie Bailey [EMAIL PROTECTED] jdow wrote: From: Bowie Bailey [EMAIL PROTECTED] Michael Monnerie wrote: On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote: I've got per-user Bayes and most of my users don't bother to train it. Another reason for site-wide bayes, I'd say. I've considered that, but it won't work in our setup. This box scans our internal email as well as all of our customer's email. Since we are in an entirely different line of business from our customers, what we consider to be ham and spam will be quite different from theirs. If I could train it on both sets, it might work, but I don't have access to any of their emails for training. Also, I really prefer a per-user bayes for our internal email since there are various accounts that get a specific type of ham and work very well with Bayes. Importune on them to feed you as large a collection of ham and spam as they can, once. Then turn on autolearn, cross your fingers, and put on your flack jacket. What flack jacket? I have Bayes turned on now and I never did any manual training on most of the accounts. I just turned it on and let autolearn (with the default settings) do it's thing. So far, I have received very few complaints. But then again, I think less than half of my users are even taking advantage of the spam markup. Since I don't do any blocking or sorting on the server, it is up to them to use MUA rules to sort or delete the spam once my server has marked it. Fairly frequently I see evidence that autolearn can massively misfire on SpamAssassin startup. It does not always happen or there'd be a lot more messages about it. But there is apparently a vulnerable period that can go bad with just the wrong selection of messages. Once the database is large inertia will save the day. {^_^}
Re: OT: anyone know how to do server-side MS-Exchange filters?
When you create filters in outlook connected to exchange you can chose if the filter is server or client side. disclaimer - I don't know what versions of outlook or exchange are required to make it work. at my previous job where we had exchange I had outlook 2000 and I don't know what version of exchange. The main problem was that sometime it would mess up and try to run a client rule on the server and fail. Jay Lee wrote: John D. Hardin wrote: On Thu, 11 May 2006, Jason Haar wrote: Has anyone done this, and if so, what sort of tools allow it? A Linux mail relay in front of the Exchange server. :) That wouldn't allow messages to be put in a subfolder instead of inbox, just to do the header tagging. Not having used Exchange I can't answer intellegently on whether or not it supports server side sorting. However, if it doesn't you could use something like Maia Mailguard and a Postfix frontend to the exchange server to quarantine and report the spam, users would be able to configure and safely view and free tagged spam messages via a web interface. It also can send regular reports to the users on what spam they've gotten, senders and subject, etc. Website is: http://www.renaissoft.com/maia/
RE: Latest sa-stats from last week
jdow wrote: From: Bowie Bailey [EMAIL PROTECTED] jdow wrote: Importune on them to feed you as large a collection of ham and spam as they can, once. Then turn on autolearn, cross your fingers, and put on your flack jacket. What flack jacket? I have Bayes turned on now and I never did any manual training on most of the accounts. I just turned it on and let autolearn (with the default settings) do it's thing. So far, I have received very few complaints. But then again, I think less than half of my users are even taking advantage of the spam markup. Since I don't do any blocking or sorting on the server, it is up to them to use MUA rules to sort or delete the spam once my server has marked it. Fairly frequently I see evidence that autolearn can massively misfire on SpamAssassin startup. It does not always happen or there'd be a lot more messages about it. But there is apparently a vulnerable period that can go bad with just the wrong selection of messages. Once the database is large inertia will save the day. Right. I understand the danger of doing things this way. I was just pointing out that my users don't generally complain about spam. I assume SpamAssassin is doing well for them, but since they never tell me anything, I really have no idea. -- Bowie
Bayes advanced questions
Dear SA users, I've had an offlist comparison of bayes DBs, and we found some interesting differences. We're trying to find out why bayes on server #1 makes better scores.: Server #1 local.cf (SA 3.1.1): bayes_expiry_max_db_size200 bayes_auto_expire 0 bayes_file_mode 0777 bayes_auto_learn_threshold_spam 8.00 bayes_auto_learn_threshold_nonspam 1.0 Server #1 bayes files: -rw-rw-rw-+ 1 vscan vscan 19738624 May 10 10:04 bayes_db_seen -rw-rw-rw-+ 1 vscan vscan 41697280 May 10 10:04 bayes_db_toks Server #1 bayes dump: 0.000 0 93053 0 non-token data: nspam 0.000 0 53428 0 non-token data: nham 0.000 01261864 0 non-token data: ntokens Server #2 local.cf: bayes_auto_learn1 bayes_learn_to_journal 1 bayes_auto_expire 1 ok_languagesde en es ok_locales en Server #2 bayes files: 21M 2006-05-10 10:20 bayes_seen 5,3M 2006-05-10 10:20 bayes_toks Server #2 bayes dump: 0.000 0 155791 0 non-token data: nspam 0.000 0 80523 0 non-token data: nham 0.000 0 129852 0 non-token data: ntokens From the numbers I would say that server #2 had learned more spam+ham, but has about 1/10th of tokens. That server is also far less accurate with bayes than server #1. Could the ntokens be the reason? With the new SPAM this last weeks, that tries to poison bayes, it could maybe be effective with the default of 150.000 tokens? Another tip for all: With server #1 setting bayes_auto_learn_threshold_spam 8.00 you could expect this message to be autolearned: X-Spam-Status: Yes, hits=8.7 required=5.0 tests=BAYES_99=3.5, HTML_MESSAGE=0.001,HTML_MIME_NO_HTML_TAG=0,HTML_TAG_EXIST_TBODY=0.282, MIME_HTML_ONLY=0.389,RELAY_DE=0.01,REPLY_TO_EMPTY=0.512, SARE_FORGED_EBAY=4 autolearn=no bayes=1. But it is autolearn=no. This shows, that manual re-feeding SPAM can be effective for your Bayes, because this sure-is-spam would not have been learned automatically. Since it's already BAYES_99, you could say don't bother, I'll be fine *g* but bayes needs to be trained permanently, because tokens time out... And why was SARE_FORGED_EBAY set down to 4? It was so nice at 100+... Also, we set bayes_expiry_max_db_size to 5, and made sa-learn --force-expire --sync But still those numbers: 0.000 0 242424 0 non-token data: nspam 0.000 0 313252 0 non-token data: nham 0.000 0 134001 0 non-token data: ntokens Why are still 134k tokens there? mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpFfLkAJB6Y1.pgp Description: PGP signature
Re: Bayes advanced questions
On Mittwoch, 10. Mai 2006 23:41 Matt Kettler wrote: Particularly on servers with a site-wide DB used against broadly diverse spread of mail, increasing the token limit will improve accuracy. However, this comes at the expense of increased storage needs and slower performance. (In particular, expiry takes a LOT longer with larger DBs) DB Files are about 60MB together, so not really big (I just got a pricelist with the new 750GB SATA drive from Seagate *g*). And tonights expiry for server #1: bayes: synced databases from journal in 11 seconds: 1968 unique entries (3059 total entries) So it's not too long also. Could possibly be longer on a server that gets some million mails per day, of course. score used is the score the message would have got if: bayes was disabled the AWL was disabled no userconf (ie:black/whitelists) rules were enabled. Thats good info which should be in the man page. Since that message scored 8.7, and derives 3.5 of it's points from BAYES_99, it does not surprise me at all the message was not learned. Also, EVEN if the learning score is over the threshold, SA will not learn a message as spam unless: there are at least 3.0 points of header rules there are at least 3.0 points of body rules Existing learning would not place the message in a low bayes category (ie: don't learn as spam if the message would have hit BAYES_00 otherwise) This is written in the man page, except the last line with the BAYES_00 wasn't clear to me from there. Is this valid just for BAYES_00 and BAYES_99, or also BAYES_05 and BAYES_95? Since it's already BAYES_99, you could say don't bother, I'll be fine *g* but bayes needs to be trained permanently, because tokens time out... Also realize that just because the message got BAYES_99 doesn't mean there are no tokens in it that can be learned from. Spam mutates. New phrases and words creep in. These need to be learned from, even if the current message is already BAYES_99. Yes, this is very valuable info for others also I believe. Thanks for your help on this, mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpe1NOeeBvm0.pgp Description: PGP signature