R: AWL and whitelists
Hi all! I don't understand something in AWL working and want somebody clears it for me. I know that AWL is a score averaging system and it's bad idia to use it as whitelist, but there is possibility --add-to-whitelist(-W) to add e-mail to AWL with -100 score. This possibility works very strange. I use sql-backend and see last changes. My actions: 1. Add email in awl with score -100 spamassassin -W test-email In awl appears new row: email|none|1|-100 (e-mail|ip|count|score) 2. I do first test check: cat test-email | spamc -R Content analysis details: (-49.4 points, 5.0 required) pts rule name description -- -- 0.6 HTML_SHORT_LENGTH BODY: HTML is extremely short 0.0 HTML_MESSAGE BODY: HTML included in message 0.5 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org -51 AWLAWL: From: address is in the auto white-list In awl row: email|62.234|1|1.109 (e-mail|ip|count|score) Something strange: count is 1 still and score has become 1.109 :( What happened to the email|none row? Is it still there? 3. I do second check cat test-email | spamc -R Content analysis details: (1.1 points, 5.0 required) pts rule name description -- -- 0.6 HTML_SHORT_LENGTH BODY: HTML is extremely short 0.0 HTML_MESSAGE BODY: HTML included in message 0.5 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org In awl row: email|62.234|2|2.218 (e-mail|ip|count|score) Where is AWL check in the second check report? Again, se if an email|null is still there Anyway, to whitelist a message source, use whitelist_from or, even better, whitelist_from_rcvd in your .cf file. If you use AWL to do this, your whitelist score may get consumed after a while. Giampaolo I expected that address will have negative score after add-to-whitelist, but it works only for a one trying. The second(and further) trying it doesn't work Why awl doesn't work as it must work - it must smoothly change score - e.g. -49.4, -25, -15 and so on... But it doesn't. Where am i not right? p.s. My system: FreeBSD 5.4/Spamassassin 3.1.5/MySQL 4.1.18 -- View this message in context: http://www.nabble.com/AWL-and-whitelists-tf2518983.html#a7025648 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Welcome to test russian ruleset
Hi everybody! Welcome to test russian ruleset for SpamAssassin. The ruleset file can be downloaded from the URL: http://sa-russian.narod.ru/99_russian_re.cf The ruleset reflects the list of tokens, often found in russian spam. The list of tokens is available at URL (KOI8-R encoding): http://sa-russian.narod.ru/tokens The ruleset was testet on two linux boxes with Perl 5.6 and 5.8. Comments and reports are gratefully appreciated. Best regards. Alan M. Makoev --
Re: R: AWL and whitelists
Giampaolo Tomassoni wrote: 2. I do first test check: cat test-email | spamc -R Content analysis details: (-49.4 points, 5.0 required) pts rule name description -- -- 0.6 HTML_SHORT_LENGTH BODY: HTML is extremely short 0.0 HTML_MESSAGE BODY: HTML included in message 0.5 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org -51 AWLAWL: From: address is in the auto white-list In awl row: email|62.234|1|1.109 (e-mail|ip|count|score) Something strange: count is 1 still and score has become 1.109 :( What happened to the email|none row? Is it still there? Yes it still there: email|62.234|1|1.109 (e-mail|ip|count|score) Giampaolo Tomassoni wrote: 3. I do second check cat test-email | spamc -R Content analysis details: (1.1 points, 5.0 required) pts rule name description -- -- 0.6 HTML_SHORT_LENGTH BODY: HTML is extremely short 0.0 HTML_MESSAGE BODY: HTML included in message 0.5 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org In awl row: email|62.234|2|2.218 (e-mail|ip|count|score) Where is AWL check in the second check report? Again, se if an email|null is still there Anyway, to whitelist a message source, use whitelist_from or, even better, whitelist_from_rcvd in your .cf file. If you use AWL to do this, your whitelist score may get consumed after a while. Giampaolo I can't use .cf file because it's dynamic mail system and users can alert about spam/not spam with the special buttons in webmail (like gmail). I don't understand how AWL works with '--add-to-whitelist' function -- View this message in context: http://www.nabble.com/AWL-and-whitelists-tf2518983.html#a7026194 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Welcome to test russian ruleset
Hello! On Spamassassin 3.1.4 I've got the following errors while executing spamassassin --lint: [3403] warn: config: invalid regexp for rule BODY_KOI8_82: . [3403] warn: config: invalid regexp for rule BODY_WIN1251_82: . [3403] warn: config: warning: score set for non-existent rule BODY_WIN1251_82 [3403] warn: config: warning: score set for non-existent rule BODY_KOI8_82 WBR, Vitaly.
Re: FW: spamd scan problem
On Friday 27 October 2006 00:48, Frank van den Diepstraten wrote: (sorry for duplicate mails) Hi all, I've got a question about spamassasin. I've got 2 mailservers with an identical installation. HTML_60_70,HTML_IMAGE_ONLY_24,HTML_MESSAGE,NO_REAL_NAME scantime=0.2, ALL_TRUSTED,IP_LINK_PLUS,NORMAL_HTTP_TO_IP,RAZOR2_CF_RANGE_51_100,RAZOR2_CH E CK,TW_JT scantime=5.2 Do you run razor on the first system? Or any other network tests? If not, that alone may be the answer. Now it could just be that your examples came out with no razor in the first one, but I gotta ask. -- _ John Andersen
RE: FW: spamd scan problem
Thnx for your response. I think thats the problem because when a grep for RAZOR in de bad systems mail.log I get full pages. When I do the same on the good system there's no output. But now the question is where I can disable this razor thing... Regards, Frank. -Oorspronkelijk bericht- Van: John Andersen [mailto:[EMAIL PROTECTED] Verzonden: vrijdag 27 oktober 2006 11:08 Aan: users@spamassassin.apache.org Onderwerp: Re: FW: spamd scan problem On Friday 27 October 2006 00:48, Frank van den Diepstraten wrote: (sorry for duplicate mails) Hi all, I've got a question about spamassasin. I've got 2 mailservers with an identical installation. HTML_60_70,HTML_IMAGE_ONLY_24,HTML_MESSAGE,NO_REAL_NAME scantime=0.2, ALL_TRUSTED,IP_LINK_PLUS,NORMAL_HTTP_TO_IP,RAZOR2_CF_RANGE_51_100,RAZOR2_CH E CK,TW_JT scantime=5.2 Do you run razor on the first system? Or any other network tests? If not, that alone may be the answer. Now it could just be that your examples came out with no razor in the first one, but I gotta ask. -- _ John Andersen
RE: FW: spamd scan problem
ok I understand that, but I wan't to know if this causes the problem. So I want to trie it out without that razor thing... But I can't find the config where it's enabled in. Regards, Frank. -Oorspronkelijk bericht- Van: John Andersen [mailto:[EMAIL PROTECTED] Verzonden: vrijdag 27 oktober 2006 11:36 Aan: users@spamassassin.apache.org Onderwerp: Re: FW: spamd scan problem On Friday 27 October 2006 01:32, Frank van den Diepstraten wrote: But now the question is where I can disable this razor thing... No no, you want to ENABLE it on the good system. Razor is wounderfull. It just takes a little bit of time, but not a great deal of CPU load. Razor catches a lot of spam with almost a non-existant false positive rate. -- _ John Andersen
Re: Rules to reject bounce messages for mail not sent by me
existing set: http://wiki.apache.org/spamassassin/VBounceRuleset ;) --j. Nick Gilbert writes: Hi, I've been trying to write some SA rules to reject bounce messages which I did not send. I've made a good start, but some bounce messages still get through but I don't understand why. The theory is that viruses and spammers don't seem to use my full e-mail address [EMAIL PROTECTED] but change the username part of it and send from an address [EMAIL PROTECTED] I would like to reject all bounce messages which have arisen from mail sent from [EMAIL PROTECTED] but NOT [EMAIL PROTECTED] This works for about 50% of mail, but I think one serious problem is that the line: header __NICK_BOUNCE_REAL To =~ /[EMAIL PROTECTED]/i ...matches on the header: X-MDaemon-Deliver-To: [EMAIL PROTECTED] Which I'm pretty sure it shouldn't! Why does it think that header is the same as a normal To header? Surely it's not scanning for headers simply ending in To? My rules are below for comment/improvement but please let me know if there's a better way to do this or an existing set of working rules somewhere. Nick... # -- BOUNCE DETECTION (stolen from # bogus_virus_warnings.cf)- # General rule to indicate bounce or otherwise - used for some other # rules header __BOUNCE_HEADER X-Is-A-Bounce =~ /.+/ # This won't match for scanning done at SMTP time, at least with Exim header __BOUNCE_RP1 Return-Path =~ /^$/ # NL says this is added by amavisd-new before passing to SA header __BOUNCE_RP2 X-Return-Path =~ /^$/ # Mark Martinec says the above is incorrect, and it's X-Envelope-From header __BOUNCE_RP3 X-Envelope-From =~ /^$/ meta __NULL_SENDER __BOUNCE_HEADER || __BOUNCE_RP1 || __BOUNCE_RP2 || __BOUNCE_RP3 # Thanks to AF header __CT_DEL_STATUS Content-Type =~ /report-type=delivery-status/ meta __NICK_IS_A_BOUNCE __NULL_SENDER || __CT_DEL_STATUS header __NICK_BOUNCE_REAL To =~ /[EMAIL PROTECTED]/i header __NICK_TO_NOT_METo =~ /[EMAIL PROTECTED]/i meta NICK_SPOOF_BOUNCE (( __NICK_IS_A_BOUNCE __NICK_TO_NOT_ME) (!__NICK_BOUNCE_REAL)) score NICK_SPOOF_BOUNCE 10 describe NICK_SPOOF_BOUNCE Attached bounce contains my address but I never sent this!
mcafee-spamassassin-rules
We are using Mcafee's anti-virus product on our mailservers and we mirror their files from ftp.nai.com on an hourly basis. Today I saw something that I did not realise they provide: mcafee-spamassassin-perl-1.0.2620-1.5002.i386.rpm mcafee-spamassassin-rules-1.0.2620-2620.5002.i386.rpm I thought that if they provide updated rules on a daily basis, I can just as well try and use those rules. However, they were written for version 2.6 and 3.0.3-2sarge1 is complaining about those rules. Is there a way to utilize their updates with the later versions of spamassassin? Or do I have to use there version of spamassassin to do so? Would that be advisable? Regards Johann -- Johann Spies Telefoon: 021-808 4036 Informasietegnologie, Universiteit van Stellenbosch If a man abide not in me, he is cast forth as a branch, and is withered; and men gather them, and cast them into the fire, and they are burned. John 15:6
Re: Rules to reject bounce messages for mail not sent by me
Justin Mason wrote: existing set: http://wiki.apache.org/spamassassin/VBounceRuleset ;) Thanks! One thing I'm not sure about - that module produces two rules. How should I score the rules so that real bounces aren't rejected but the fake ones are? I presume I do it this way round: score BOUNCE_MESSAGE 10 score ANY_BOUNCE_MESSAGE 0.1 I presume BOUNGE_MESSAGE is only true if the bounce is for a mail not sent by me? If so, I'm surprised the rule name isn't SPOOF_BOUNCE_MESSAGE or similar. My mail server rejects messages with spam scores of 10 or above. Nick... Nick Gilbert writes: Hi, I've been trying to write some SA rules to reject bounce messages which I did not send. I've made a good start, but some bounce messages still get through but I don't understand why. The theory is that viruses and spammers don't seem to use my full e-mail address [EMAIL PROTECTED] but change the username part of it and send from an address [EMAIL PROTECTED] I would like to reject all bounce messages which have arisen from mail sent from [EMAIL PROTECTED] but NOT [EMAIL PROTECTED] This works for about 50% of mail, but I think one serious problem is that the line: header __NICK_BOUNCE_REAL To =~ /[EMAIL PROTECTED]/i ...matches on the header: X-MDaemon-Deliver-To: [EMAIL PROTECTED] Which I'm pretty sure it shouldn't! Why does it think that header is the same as a normal To header? Surely it's not scanning for headers simply ending in To? My rules are below for comment/improvement but please let me know if there's a better way to do this or an existing set of working rules somewhere. Nick... # -- BOUNCE DETECTION (stolen from # bogus_virus_warnings.cf)- # General rule to indicate bounce or otherwise - used for some other # rules header __BOUNCE_HEADER X-Is-A-Bounce =~ /.+/ # This won't match for scanning done at SMTP time, at least with Exim header __BOUNCE_RP1 Return-Path =~ /^$/ # NL says this is added by amavisd-new before passing to SA header __BOUNCE_RP2 X-Return-Path =~ /^$/ # Mark Martinec says the above is incorrect, and it's X-Envelope-From header __BOUNCE_RP3 X-Envelope-From =~ /^$/ meta __NULL_SENDER __BOUNCE_HEADER || __BOUNCE_RP1 || __BOUNCE_RP2 || __BOUNCE_RP3 # Thanks to AF header __CT_DEL_STATUS Content-Type =~ /report-type=delivery-status/ meta __NICK_IS_A_BOUNCE __NULL_SENDER || __CT_DEL_STATUS header __NICK_BOUNCE_REAL To =~ /[EMAIL PROTECTED]/i header __NICK_TO_NOT_METo =~ /[EMAIL PROTECTED]/i meta NICK_SPOOF_BOUNCE (( __NICK_IS_A_BOUNCE __NICK_TO_NOT_ME) (!__NICK_BOUNCE_REAL)) score NICK_SPOOF_BOUNCE 10 describe NICK_SPOOF_BOUNCE Attached bounce contains my address but I never sent this! -- Nick Gilbert, Software Developer X-RM Limited, Winchester, UK W: http://www.x-rm.com/ E: [EMAIL PROTECTED] T: 01962 877237 F: 01962 842346
Re: Rules to reject bounce messages for mail not sent by me
PS. Will setting up SPF on my domain name have any effect for things like this? Will it discourage spammers from using my domain or reduce the number of bounce messages I/we get? Nick... Nick Gilbert wrote: Justin Mason wrote: existing set: http://wiki.apache.org/spamassassin/VBounceRuleset ;) Thanks! One thing I'm not sure about - that module produces two rules. How should I score the rules so that real bounces aren't rejected but the fake ones are? I presume I do it this way round: score BOUNCE_MESSAGE 10 score ANY_BOUNCE_MESSAGE 0.1 I presume BOUNGE_MESSAGE is only true if the bounce is for a mail not sent by me? If so, I'm surprised the rule name isn't SPOOF_BOUNCE_MESSAGE or similar. My mail server rejects messages with spam scores of 10 or above. Nick... Nick Gilbert writes: Hi, I've been trying to write some SA rules to reject bounce messages which I did not send. I've made a good start, but some bounce messages still get through but I don't understand why. The theory is that viruses and spammers don't seem to use my full e-mail address [EMAIL PROTECTED] but change the username part of it and send from an address [EMAIL PROTECTED] I would like to reject all bounce messages which have arisen from mail sent from [EMAIL PROTECTED] but NOT [EMAIL PROTECTED] This works for about 50% of mail, but I think one serious problem is that the line: header __NICK_BOUNCE_REAL To =~ /[EMAIL PROTECTED]/i ...matches on the header: X-MDaemon-Deliver-To: [EMAIL PROTECTED] Which I'm pretty sure it shouldn't! Why does it think that header is the same as a normal To header? Surely it's not scanning for headers simply ending in To? My rules are below for comment/improvement but please let me know if there's a better way to do this or an existing set of working rules somewhere. Nick... # -- BOUNCE DETECTION (stolen from # bogus_virus_warnings.cf)- # General rule to indicate bounce or otherwise - used for some other # rules header __BOUNCE_HEADER X-Is-A-Bounce =~ /.+/ # This won't match for scanning done at SMTP time, at least with Exim header __BOUNCE_RP1 Return-Path =~ /^$/ # NL says this is added by amavisd-new before passing to SA header __BOUNCE_RP2 X-Return-Path =~ /^$/ # Mark Martinec says the above is incorrect, and it's X-Envelope-From header __BOUNCE_RP3 X-Envelope-From =~ /^$/ meta __NULL_SENDER __BOUNCE_HEADER || __BOUNCE_RP1 || __BOUNCE_RP2 || __BOUNCE_RP3 # Thanks to AF header __CT_DEL_STATUS Content-Type =~ /report-type=delivery-status/ meta __NICK_IS_A_BOUNCE __NULL_SENDER || __CT_DEL_STATUS header __NICK_BOUNCE_REAL To =~ /[EMAIL PROTECTED]/i header __NICK_TO_NOT_METo =~ /[EMAIL PROTECTED]/i meta NICK_SPOOF_BOUNCE (( __NICK_IS_A_BOUNCE __NICK_TO_NOT_ME) (!__NICK_BOUNCE_REAL)) score NICK_SPOOF_BOUNCE 10 describe NICK_SPOOF_BOUNCE Attached bounce contains my address but I never sent this! -- Nick Gilbert, Software Developer X-RM Limited, Winchester, UK W: http://www.x-rm.com/ E: [EMAIL PROTECTED] T: 01962 877237 F: 01962 842346
using --add-to-blacklist feature of spamassassin
hey friends, I am using SA 3.1.3 on FC3 with Postfix. I tried the --add-to-blacklist feature of spamassassin. spamassassin --add-to-blacklist /home/testing/Maildir/.spam/cur/ SpamAssassin auto-whitelist: adding address to blacklist: [EMAIL PROTECTED] Is this right way to use this command and can somebody tell me the path of the blacklist file where these names are getting added ? I ran the above command as root and in the .spamassassin directory these files are there auto-whitelist bayes_seen bayes_toks user_prefs but I am not able to find the blacklist file and when I ran this command spamassassin --lint I got the below error [17284] warn: auto-whitelist: open of auto-whitelist file failed: auto-whitelist: cannot open auto_whitelist_path /root/.spamassassin/auto-whitelist: Inappropriate ioctl for device Why this error is coming and what should I do to get rid of this error ? Please let me now if you need any further inputs. Thanks Regards Ankush Grover
Re: High CPU running SA in a VMware VM
On Thu, 26 Oct 2006 21:48:22 -0700 Gary W. Smith [EMAIL PROTECTED] wrote: Did you pre-allocate the disk space? If not you might consider do that first and defragging the disk. Good point! I forgot about the disk space.
How to test new plugins
How can you test new plugins? [EMAIL PROTECTED] CocoNet Corporation SW Florida's First ISP 825 SE 47th Terrace Cape Coral, FL 33904 (239) 540-2626 Voice
Re: Per Domain Whitelisting
jasonegli wrote: For example let's say that domain xyz.com wants to allow all messages from yahoo.com, but domain 123.com does not. Is there a way to allow FROM [EMAIL PROTECTED] TO [EMAIL PROTECTED]? Obtuse SMTPD (http://sd.inodes.org/) can handle this at the SMTP level. I think it may be possible to add this to MailScanner (http://www.mailscanner.info/) through it's custom rules; its default whitelists/blacklists, however, are global.
Re: spamassassin --lint fails with rules in local.cf
Dylan Bouterse wrote: ** [EMAIL PROTECTED] spamassassin]# pwd /usr/share/spamassassin [EMAIL PROTECTED] spamassassin]# grep SARE_GIF_ATTACH * 70_sare_stocks.cf:full SARE_GIF_ATTACH /name=\?[0-9a-z._\-]{3,18}\.gif\?/i 70_sare_stocks.cf:describe SARE_GIF_ATTACH Email has a inline gif 70_sare_stocks.cf:scoreSARE_GIF_ATTACH 0.75 [EMAIL PROTECTED] spamassassin]# grep SARE_GIF_STOX * 70_sare_stocks.cf:describe SARE_GIF_STOX Inline Gif with little HTML 70_sare_stocks.cf:scoreSARE_GIF_STOX 1.66 [EMAIL PROTECTED] spamassassin]# grep SARE_SPEC_XXGEOCITIES2 * 70_sare_specific.cf:meta SARE_SPEC_XXGEOCITIES2 !__SARE_SPEC_XXGEOCITIE __SARE_SPEC_XX2GEOCIT 70_sare_specific.cf:describe SARE_SPEC_XXGEOCITIES2 spamsign pointing to free webhost spam site 70_sare_specific.cf:score SARE_SPEC_XXGEOCITIES2 1.666 [EMAIL PROTECTED] spamassassin]# grep SARE_SPEC_XXGEOCITIES3 * 70_sare_specific.cf:meta SARE_SPEC_XXGEOCITIES3 __SARE_SPEC_XXGEOCITIE__SARE_SPEC_XX2GEOCIT 70_sare_specific.cf:describe SARE_SPEC_XXGEOCITIES3 spamsign pointing to free webhost spam site 70_sare_specific.cf:score SARE_SPEC_XXGEOCITIES3 1.666 *My guess is that the lint check is reading the local.cf file before the additional SARE rule sets.** My --list reads:* [16109] dbg: config: using /etc/mail/spamassassin for site rules pre files [16109] dbg: config: read file /etc/mail/spamassassin/init.pre [16109] dbg: config: read file /etc/mail/spamassassin/v310.pre [16109] dbg: config: read file /etc/mail/spamassassin/v312.pre [16109] dbg: config: using /var/lib/spamassassin/3.001003 for sys rules pre files [16109] dbg: config: using /var/lib/spamassassin/3.001003 for default rules dir [16109] dbg: config: read file /var/lib/spamassassin/3.001003/updates_spamassassin_org.cf [16109] dbg: config: using /etc/mail/spamassassin for site rules dir [16109] dbg: config: read file /etc/mail/spamassassin/local.cf *And the SARE ruleset configs come after that. My SARE rulesets are in /usr/share/spamassassin. Should I put my local.cf file there as well or am I** going down** the wrong path?* You're using the wrong path. Move your SARE rules to /etc/mail/spamassassin/ where they belong. The SARE rulesets must be parsed BEFORE your local.cf. Also, are you sure the ones in /usr/share/spamassassin are even being parsed? According to the above, your system is using /var/lib/spamassassin/3.001003 instead of /usr/share/spamassassin. That said, in general, don't monkey with anything but the site rules dir. Any other rule directories, such as the default rules dir, are for SA's own rules, and the SA installer feels perfectly free to rm -f * on those directories.
Re: Where is the latest Imageinfo?
Not sure if it's the latest, but a reference is: http://www.rulesemporium.com/plugins.htm#imageinfo Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Problems with header rewrite
Hi, hope someone can help me with the header rewrite. I'm user FC6, SA 3.1.4 and Evolution as MUA. My local.cf looks like that: # SpamAssassin config file for version 3.x # NOTE: NOT COMPATIBLE WITH VERSIONS 2.5 or 2.6 # See http://www.yrex.com/spam/spamconfig25.php for earlier versions # Generated by http://www.yrex.com/spam/spamconfig.php (version 1.50) # How many hits before a message is considered spam. required_score 5.0 # Change the subject of suspected spam rewrite_header subject *SPAM* # Encapsulate spam in an attachment (0=no, 1=yes, 2=safe) report_safe 1 # Enable the Bayes system use_bayes 1 # Enable Bayes auto-learning bayes_auto_learn 1 # Enable or disable network checks skip_rbl_checks 1 use_razor2 1 use_dcc 1 use_pyzor 1 # Mail using languages used in these country codes will not be marked # as being possibly spam in a foreign language. ok_languagesall # Mail using locales used in these country codes will not be marked # as being possibly spam in a foreign language. ok_locales all chmod is 644. But when I send me an GTUBE mail, the header don't will be rewritten and also he subject don't will be changed. Does someone has any idea, why the header will be not changed? P.S. Sorry for my maybe badly english. -- Greetings out of Munich Hans
Re: Problems with header rewrite
Hans München wrote: Hi, hope someone can help me with the header rewrite. I'm user FC6, SA 3.1.4 and Evolution as MUA. My local.cf looks like that: snip chmod is 644. But when I send me an GTUBE mail, the header don't will be rewritten and also he subject don't will be changed. Does someone has any idea, why the header will be not changed? Where have you integrated SA into your mail processing? It sounds like SA isn't even being called. Did you configure Evolution to feed the mail to SA? SA isn't actually called when you just install it, you have to explicitly configure something to call it. There's dozens of different ways to do this, so this can't just happen automatically when you install. The installer wouldn't know where you wanted SA inserted :) Here's one web article showing how to get evolution to pipe mail into SA: http://www.atlantawebhost.com/articles/evolution_spamassassin.php (note: I don't use Evolution, so I can't attest to the accuracy. However, this looks correct)
RE: spamassassin --lint fails with rules in local.cf (now perl plugin error for TextCat)
-Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Friday, October 27, 2006 9:13 AM To: Dylan Bouterse Cc: users@spamassassin.apache.org Subject: Re: spamassassin --lint fails with rules in local.cf Dylan Bouterse wrote: ** [EMAIL PROTECTED] spamassassin]# pwd /usr/share/spamassassin [EMAIL PROTECTED] spamassassin]# grep SARE_GIF_ATTACH * 70_sare_stocks.cf:full SARE_GIF_ATTACH /name=\?[0-9a-z._\-]{3,18}\.gif\?/i 70_sare_stocks.cf:describe SARE_GIF_ATTACH Email has a inline gif 70_sare_stocks.cf:scoreSARE_GIF_ATTACH 0.75 [EMAIL PROTECTED] spamassassin]# grep SARE_GIF_STOX * 70_sare_stocks.cf:describe SARE_GIF_STOX Inline Gif with little HTML 70_sare_stocks.cf:scoreSARE_GIF_STOX 1.66 [EMAIL PROTECTED] spamassassin]# grep SARE_SPEC_XXGEOCITIES2 * 70_sare_specific.cf:meta SARE_SPEC_XXGEOCITIES2 !__SARE_SPEC_XXGEOCITIE __SARE_SPEC_XX2GEOCIT 70_sare_specific.cf:describe SARE_SPEC_XXGEOCITIES2 spamsign pointing to free webhost spam site 70_sare_specific.cf:score SARE_SPEC_XXGEOCITIES2 1.666 [EMAIL PROTECTED] spamassassin]# grep SARE_SPEC_XXGEOCITIES3 * 70_sare_specific.cf:meta SARE_SPEC_XXGEOCITIES3 __SARE_SPEC_XXGEOCITIE__SARE_SPEC_XX2GEOCIT 70_sare_specific.cf:describe SARE_SPEC_XXGEOCITIES3 spamsign pointing to free webhost spam site 70_sare_specific.cf:score SARE_SPEC_XXGEOCITIES3 1.666 *My guess is that the lint check is reading the local.cf file before the additional SARE rule sets.** My --list reads:* [16109] dbg: config: using /etc/mail/spamassassin for site rules pre files [16109] dbg: config: read file /etc/mail/spamassassin/init.pre [16109] dbg: config: read file /etc/mail/spamassassin/v310.pre [16109] dbg: config: read file /etc/mail/spamassassin/v312.pre [16109] dbg: config: using /var/lib/spamassassin/3.001003 for sys rules pre files [16109] dbg: config: using /var/lib/spamassassin/3.001003 for default rules dir [16109] dbg: config: read file /var/lib/spamassassin/3.001003/updates_spamassassin_org.cf [16109] dbg: config: using /etc/mail/spamassassin for site rules dir [16109] dbg: config: read file /etc/mail/spamassassin/local.cf *And the SARE ruleset configs come after that. My SARE rulesets are in /usr/share/spamassassin. Should I put my local.cf file there as well or am I** going down** the wrong path?* You're using the wrong path. Move your SARE rules to /etc/mail/spamassassin/ where they belong. The SARE rulesets must be parsed BEFORE your local.cf. Also, are you sure the ones in /usr/share/spamassassin are even being parsed? According to the above, your system is using /var/lib/spamassassin/3.001003 instead of /usr/share/spamassassin. That said, in general, don't monkey with anything but the site rules dir. Any other rule directories, such as the default rules dir, are for SA's own rules, and the SA installer feels perfectly free to rm -f * on those directories. Amavisd read the /usr/share/spamassassin dir which is probably why --lint didn't work but reloading amavisd would work. Either way. I moved my /usr/share/spamassassin dir contents to /etc/mail/spamassassin. I get the following errors when trying to --lint. [3246] dbg: plugin: loading Mail::SpamAssassin::Plugin::TextCat from @INC [3246] warn: textcat: languages filename not defined [3246] dbg: plugin: registered Mail::SpamAssassin::Plugin::TextCat=HASH(0x9760db8) [3246] warn: config: invalid regexp for rule SUBJ_SOMEONE_WROTE: Subject =~ /\bwrote:$/i: missing or invalid delimiters [3246] warn: config: warning: description exists for non-existent rule SUBJ_SOMEONE_WROTE [3246] warn: config: warning: score set for non-existent rule SUBJ_SOMEONE_WROTE [3246] warn: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Plugin/TextCat.pm line 380. [3246] warn: Use of uninitialized value in join or string at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Plugin/TextCat.pm line 391. [3246] dbg: textcat: language possibly: [3246] warn: Use of uninitialized value in join or string at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Plugin/TextCat.pm line 469. The SUBJ_SOMEONE_WROTE rule that was posted a week ago or so on the list isn't passing. 20_phrases.cf:body SUBJ_SOMEONE_WROTE Subject =~ /\bwrote:$/i 20_phrases.cf:describe SUBJ_SOMEONE_WROTE Search for Subject lines ending in wrote: 50_scores.cf:score SUBJ_SOMEONE_WROTE 3.000 I still get the TextCat errors even if I comment out the SUBJ_SOMEONE_WROTE rule. Dylan
[OT] Filter Server Specs
Currently, we are looking to install a server that will be doing content filtering for our main e-mail server. I thought I would toss this out to everyone to get some feedback on if the server would be adequate. The server is a Dell PowerEdge 6850 with the following: - Four 2.6 GHz/800Mhz/4mb Cache Dual-Core Intel Zeon 7110M processors - Eight GB DDR2 400Mhz ram - Four 300GB, 3Gbps, SAS, 10K RPM Hard Drives running Raid-5 on a PERC5/i controller Our main e-mail server services over 500 domains with an account total of around 40,000. The current filter server we have can not do any content filtering outside of itself (i.e. the MTA) because of CPU load (i.e. SpamAssassin). Any message scanning where the message size is over 1.5K will kill the CPU. The current filter server we have in place is rejecting an average 2.4 million per day with just the common blacklisting and some other things that are set in place. The other thing I would like to know is what kind of an operating system would one install on this new server? Again, I appreciate any feedback that can be said.
Re: How to test new plugins
On Fri, Oct 27, 2006 at 08:08:36AM -0400, Patrick Sherrill wrote: How can you test new plugins? Load the plugin and include any associated configs, then see what happens. (the question is extremely vague, so this answer is probably not very useful.) -- Randomly Selected Tagline: What the hell is this? For crying out loud, somebody throw a pie! - Peter Griffin on Family Guy pgpPVrzzsjYRk.pgp Description: PGP signature
RE: High CPU running SA in a VMware VM
The I/O rate is pretty low. The files going through expiration are only about 5 MB, and it only takes one of these to drive the CPU up. I think there are over 100,000 tokens in the file, each with a timestamp, and I believe there must be some sorting going on, so I suspect that is where the issue is.Thanks,Ian"Gary W. Smith" [EMAIL PROTECTED] wrote:What does the IO usage look like on the server? We ran a couple of our backup SA instances on VMWare but they database is on a remote SQL server. So the only IO is logging. We have several VM Instances for a variety of things. Did you pre-allocate the disk space? If not you might consider do that first and defragging the disk.From: Sammy Anderson [mailto:[EMAIL PROTECTED] Sent: Thursday, October 26, 2006 3:52 PM To: users@spamassassin.apache.org Subject: High CPU running SA in a VMware VMWe recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. We did not notice spikes on the old server, but it is really hammering the VM. Has anyone else experienced this problem? For now I have disabled Bayes altogether because of the unacceptable load. --SA Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail. Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail.
Re: High CPU running SA in a VMware VM
The guest has more memory than it is using, so it isn't doing any paging or swapping.As for the ESX 2.5.4 box, it isn't swapping either. There is currently enough physical RAM for the few VM's running.[EMAIL PROTECTED] wrote: On Thu, 26 Oct 2006 15:52:17 -0700 (PDT) Sammy Anderson wrote:We recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. We did not notice spikes on the old server, but it is really hammering the VM. Has anyone else experienced this problem? For now I have disabled Bayes altogether because of the unacceptable load.Perhaps memory started to spill into the swap on either the VM or guest OS.I don't know what version of VMWare you are using. I'm using v5.2.2 runningunder Windows. In the memory preferences I have mine set so all the virtualmachine memory has to fit into the reserved host ram. I've done small testswith SA before and haven't had any problems. Then again, I haven't foundanything I can use to put a load on a test install. My test bed is on aduo-core 3.2ghz with four gig of ram. The VM has a full gig of ram allocatedand is running the release version of FreeBSD 6.1. __Do You Yahoo!?Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: mcafee-spamassassin-rules
On Fri, Oct 27, 2006 at 12:25:53PM +0200, Johann Spies wrote: just as well try and use those rules. However, they were written for version 2.6 and 3.0.3-2sarge1 is complaining about those rules. My recollection is that they're using a pre-3.0 version of SA, with (I'd imagine) a number of modifications. Is there a way to utilize their updates with the later versions of spamassassin? Or do I have to use there version of spamassassin to do so? Would that be advisable? It's hard to say since they could have modified their SA in any number of ways. You'd want to go through the config line by line and see what can be used directly, what could be used with modification, and what can't be used because it requires proprietary changes. It's also worth keeping in mind that spam detection isn't just about rules, it's also about the engine, so just because rules work well with their code doesn't mean it'll work well on the standard code. It's also worth noting that hypothetically, if I was a company releasing updates based on an open-source product, I may have incentive to avoid making those updates useful on said product, otherwise people would download my updates and not pay me for the software. -- Randomly Selected Tagline: the real ttys became pseudo ttys and vice-versa. - Today's BOFH Excuse pgpMs1tMea3I6.pgp Description: PGP signature
Re: what's the matter here? Text::Wrap
On Fri, Oct 27, 2006 at 05:15:57PM +0800, Xueron Nee wrote: When I use CPAN to upgade my SA from 3.1.4 to current version, it prints many warnings like these: t/rcvd_parser...ok 40/53(?:(?=[\s,]))* matches null string many times in regex; marked by -- HERE in m/\G(?:(?=[\s,]))* -- HERE \Z/ at /usr/lib/perl5/5.8.5/Text/Wrap.pm line 46. [...] Seems there is something wrong with Text::Wrap. Yep. # perl -MText::Wrap -e 'print $Text::Wrap::VERSION;' 2006.0711 cpan install Text::Wrap Text::Wrap is up to date. Yeah, you need to downgrade since they haven't fixed this bug yet. See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5056 for more info. -- Randomly Selected Tagline: I like work; it fascinates me; I can sit and look at it funny... pgp8W7a5JYlMx.pgp Description: PGP signature
'spamassassin --revoke' and 'razor-revoke' are interchangeable?
Hello all, Could someone tell me if 'spamassassin --revoke' and 'razor-revoke' are interchangeable? What exactly happening when I revoke the 'false negative' message? Its details reported to razor2 DB and BAYESIAN DB as ham? Are these messages being resend to the original recipients? Can I use the following syntax on my Cyrus system?: spamassassin --revoke /ham_folder/* or /usr/lib/razor-revoke /ham_folder/* sa-learn --showdots --ham /ham_folder/* Regards, Leon Kolchinsky
Re: Scoring base64 blob messages
On Thu, Oct 26, 2006 at 12:19:23PM -0400, Peter H. Lemieux wrote: No, because there are going to be a lot of mails that would hit that. Really? Maybe it's because I live in the US, but I can't think of a legitimate message I've ever received consisting only of a base64 blob. You look at a lot of raw messages? ;) Our of curiosity, how frequently does this appear in the SA ham corpus? Well, there isn't a SA corpus, so there's no answer to that question. As for how often it happens in my corpus, I don't know I'd have to write a rule and run it against the messages. Rather than making anyone else do the work for me, is there something I can read about how to determine the frequency of different message features appearing in the corpus? You can generate some rules and use mass-check to run against your own corpus to gather some statistics. I'm willing to run some rules for you against my corpus if you want. I just don't have time to come up with the rules right now. -- Randomly Selected Tagline: strrev(strcpy(xus yti +7,varg)-7)[0]='G' pgpF2Hq77D2uV.pgp Description: PGP signature
Re: Scoring base64 blob messages
Peter H. Lemieux wrote: Theo Van Dinter wrote: On Thu, Oct 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote: Also is there an SA rule that scores messages that contain only a single base64 part (as opposed to a base64-encoded attachment)? I doubt many legitimate messages arrive with only a single base64 part. No, because there are going to be a lot of mails that would hit that. Really? Maybe it's because I live in the US, but I can't think of a legitimate message I've ever received consisting only of a base64 blob. Our of curiosity, how frequently does this appear in the SA ham corpus? Rather than making anyone else do the work for me, is there something I can read about how to determine the frequency of different message features appearing in the corpus? Most messages sent from a Blackberry would hit this rule, for example.
RE: I'm thinking about suing Microsoft
I think there is a problem where a version of XP downloads the security patches automatically, but does not install them. This does not lead to increased security, because most users are gnorant of security patches and would never install them manually. Michael --On Montag, 23. Oktober 2006 16:46 -0400 Rose, Bobby [EMAIL PROTECTED] wrote: But windows patches are free. Even if you are using an illegal copy of windows, you can still manually download and install the patches. It's Microsoft Update where they mostly have the genuine windows verification code. Even Redhat forces you to pay subscriptions for their autoupdate management stuff. -Original Message- From: Marc Perkel [mailto:[EMAIL PROTECTED] Sent: Monday, October 23, 2006 3:59 PM To: Jo Cc: Duane Hill; users@spamassassin.apache.org Subject: Re: I'm thinking about suing Microsoft Popularity is a factor. But the real vulnerability is that Windows can be more secure if it has the patches. If Linux for example restricted it's seurity patches to only licensed users they would have the same problem. I'm not saying either that MS should be compelled to distribute any upgrades for free. Just secutiry fixes.
Re: Per Domain Whitelisting
Roman Sozinov wrote: Peter H. Lemieux wrote: jasonegli wrote: For example let's say that domain xyz.com wants to allow all messages from yahoo.com, but domain 123.com does not. Is there a way to allow FROM [EMAIL PROTECTED] TO [EMAIL PROTECTED]? Obtuse SMTPD (http://sd.inodes.org/) can handle this at the SMTP level. I think it may be possible to add this to MailScanner (http://www.mailscanner.info/) through it's custom rules; its default whitelists/blacklists, however, are global. What about spamassassin? Does it have possibility Per Domain Whitelisting? Of course it does. It supports per user preferences, so if you pass nothing but domain names it thus supports per domain preferences. Daryl
Re: How to test new plugins
I guess what I'm looking for is a way to test the plug-ins/configuration against a separate instance of sa that would read the new cfs without restarting existing daemons (we're using amavis-new). Pat... - Original Message - From: Theo Van Dinter [EMAIL PROTECTED] To: users@spamassassin.apache.org Sent: Friday, October 27, 2006 11:00 AM Subject: Re: How to test new plugins
Re: Scoring base64 blob messages
Peter H. Lemieux wrote: Theo Van Dinter wrote: On Thu, Oct 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote: Also is there an SA rule that scores messages that contain only a single base64 part (as opposed to a base64-encoded attachment)? I doubt many legitimate messages arrive with only a single base64 part. No, because there are going to be a lot of mails that would hit that. Really? Maybe it's because I live in the US, but I can't think of a legitimate message I've ever received consisting only of a base64 blob. I'm not sure what to say to that. ;) Our of curiosity, how frequently does this appear in the SA ham corpus? Ticketmaster sends out *a lot* of their mail this way. I'm sure it's partly in an attempt to avoid having their mail FP against crappy filters. Daryl
Re: How to test new plugins
On Fri, Oct 27, 2006 at 12:40:57PM -0400, Patrick Sherrill wrote: I guess what I'm looking for is a way to test the plug-ins/configuration against a separate instance of sa that would read the new cfs without restarting existing daemons (we're using amavis-new). You can copy the /etc/mail/spamassassin directory to somewhere else, then change the pre and cf files in that dir. Then you can test spamassassin/spamd/etc with the --siteconfigpath option to override its default value. :) (for spamd, if you already have a running copy at port 783, you'd want to run it and spamc via a different port, of course.) -- Randomly Selected Tagline: linux: because a PC is a terrible thing to waste ([EMAIL PROTECTED] put this on Tshirts in '93) pgpBiW2Klr9zW.pgp Description: PGP signature
Re: Scoring base64 blob messages
On Fri, Oct 27, 2006 at 11:44:48AM -0400, Daryl C. W. O'Shea wrote: Ticketmaster sends out *a lot* of their mail this way. I'm sure it's partly in an attempt to avoid having their mail FP against crappy filters. I'd also imagine that sometimes it's just easier to do this than try to pay attention to what is being sent and determine if encoding is necessary. Programmers tend to be lazy after all. :) -- Randomly Selected Tagline: There are two major products to come out of Berkeley: LSD and UNIX. We don't believe this to be a coincidence. - Unknown pgpFdvR1uEW9A.pgp Description: PGP signature
Re: spamd scan problem
On 27-okt-2006, at 11:40, Frank van den Diepstraten wrote: ok I understand that, but I wan't to know if this causes the problem. So I want to trie it out without that razor thing... But I can't find the config where it's enabled in. Hi Frank, To disable razor, add the following to your local.cf: use_razor2 0 Peter -Oorspronkelijk bericht- Van: John Andersen [mailto:[EMAIL PROTECTED] Verzonden: vrijdag 27 oktober 2006 11:36 Aan: users@spamassassin.apache.org Onderwerp: Re: FW: spamd scan problem On Friday 27 October 2006 01:32, Frank van den Diepstraten wrote: But now the question is where I can disable this razor thing... No no, you want to ENABLE it on the good system. Razor is wounderfull. It just takes a little bit of time, but not a great deal of CPU load. Razor catches a lot of spam with almost a non-existant false positive rate. -- _ John Andersen
RE: mcafee-spamassassin-rules
Title: RE: mcafee-spamassassin-rules It's also worth noting that hypothetically, if I was a company releasing updates based on an open-source product, I may have incentive to avoid making those updates useful on said product, otherwise people would download my updates and not pay me for the software. Wouldn't that be against the open source lic? I'm sure they don't use open source rules either. *giggle* --Chris
Re: I'm thinking about suing Microsoft
You have to explicitly choose that option. Are you suggesting we shouldn't be able to choose that? I'm not a big fan of trusting MS patches, as they tend to break things periodically...On Oct 27, 2006, at 8:47 AM, Michael Beckmann wrote:I think there is a problem where a version of XP downloads the security patches automatically, but does not install them. This does not lead to increased security, because most users are gnorant of security patches and would never install them manually.Michael--On Montag, 23. Oktober 2006 16:46 -0400 "Rose, Bobby" [EMAIL PROTECTED] wrote: But windows patches are free. Even if you are using an illegal copy ofwindows, you can still manually download and install the patches. It'sMicrosoft Update where they mostly have the genuine windows verificationcode. Even Redhat forces you to pay subscriptions for their autoupdatemanagement stuff.-Original Message-From: Marc Perkel [mailto:[EMAIL PROTECTED]]Sent: Monday, October 23, 2006 3:59 PMTo: JoCc: Duane Hill; users@spamassassin.apache.orgSubject: Re: I'm thinking about suing MicrosoftPopularity is a factor. But the real vulnerability is that Windows canbe more secure if it has the patches. If Linux for example restrictedit's seurity patches to only licensed users they would have the sameproblem. I'm not saying either that MS should be compelled to distributeany upgrades for free. Just secutiry fixes. -- Jay ChandlerNetwork Administrator, Chapman University714-628-7249 / [EMAIL PROTECTED]"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want. I wish Christopher Robin was here." -- Peter Da Silva in a.s.r.
RE: High CPU running SA in a VMware VM
From: Sammy Anderson [mailto:[EMAIL PROTECTED] We recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. I just did the same thing last week, except we're using RHEL 3 and ESX 2.5.2, and the physical box it used to be on was far less powerful then yours. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. Hmm. We're using ours as a site-wide MTA to be able to reject incoming mails at SMTP time, so no user DBs on the box, but we are running with Bayes checking on (Berkeley DB), autolearning off, and manual Bayes feeding only a few times a day. Because of that, I don't have practice with a heavy Bayes load, but how certain are you that it's Bayes hitting the CPU; did you run sa-learn (or spamassassin) with network reporting turned off to see if that makes a difference? I ask because pyzor did keep our CPU at a constant 75% until I turned it off; now it varies from 25% to 75% over the day, which is a lot more acceptable :) Another thought, albeit perhaps not directly related, is are you running spamd with --robin-robin? When I did that, it reduced the CPU load with the trade-off of using a little more memory, which seems to be the better trade-off, especially for a VM on ESX. -- John C. Ring, Jr. [EMAIL PROTECTED] Network Engineer Union Switch Signal Inc. If men were angels, no government would be necessary. If angels were to govern men, neither external nor internal controls on government would be necessary. -- James Madison
Re: ImageInfo vs FuzzyOCR performance?
--On Friday, October 27, 2006 6:29 AM -0700 Jeff Chan [EMAIL PROTECTED] wrote: Does anyone have any recent feedback about the performance of ImageInfo versus FuzzyOCR about detecting stock image spams (or any others)? Does FuzzyOCR catch significantly more spams than ImageInfo? The last I checked, ImageInfo simply reads some header info from the image. It's pretty lightweight, probably more so than any Perl-based regex in SA. FuzzyOCR is much more compute-intensive, since it has to perform image processing (through gocr, as well as conversions necessary to get the input into the format that gocr expects).
Re: [OT] Filter Server Specs
On Fri, Oct 27, 2006 at 02:42:49PM +, Duane Hill wrote: Currently, we are looking to install a server that will be doing content filtering for our main e-mail server. I thought I would toss this out to everyone to get some feedback on if the server would be adequate. The server is a Dell PowerEdge 6850 with the following: - Four 2.6 GHz/800Mhz/4mb Cache Dual-Core Intel Zeon 7110M processors - Eight GB DDR2 400Mhz ram - Four 300GB, 3Gbps, SAS, 10K RPM Hard Drives running Raid-5 on a PERC5/i controller Our main e-mail server services over 500 domains with an account total of around 40,000. The current filter server we have can not do any content filtering outside of itself (i.e. the MTA) because of CPU load (i.e. SpamAssassin). Any message scanning where the message size is over 1.5K will kill the CPU. The current filter server we have in place is rejecting an average 2.4 million per day with just the common blacklisting and some other things that are set in place. I *think* this should handle your load. Personally from my years of ISP experience, I'd strongly favor going the road of multiple identical servers in parallel rather than putting all your eggs in one basket. E.g. use two 4 CPU servers rather than one 8 CPU (4x dualcore) server. The difference is that if it comes up just short, or if load jumps up again, it's easier to add a 3rd server and cut it into the mail path than to upgrade a server which is handling all your filtering. You also don't need fast hard drives on a filtering server; it's almost all gonna be pushing the CPU and RAM. The other thing I would like to know is what kind of an operating system would one install on this new server? This'll get you into a religious war for sure... I would favor FreeBSD latest (6.x), but any version of Linux with a good package system and a recent 2.6 kernel is a good choice - maybe better than FreeBSD at using 8 CPUs. Reasonable possibilities include CentOS, Gentoo, Debian. I'm not a big Linux head, others may have stronger opinions on that front. -- Clifton -- Clifton Royston -- [EMAIL PROTECTED] / [EMAIL PROTECTED] President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services
RE: High CPU running SA in a VMware VM
I'm pretty sure it is that, because when I turn of bayes altogether, the spikes go away. I also ran sa-learn --force-expire and it PEGS the VM. With bayes debugging enabled, I see lines like this in my syslog:bayes: expired old bayes database entries in 236 seconds: 152268 entries kept, 9457 deletedWe have about 140 users, each with a 5 MB bayes_toks file, so there is a need to expire somebody all throughout the day. Each user is virtual, they don't really have an account on the box, but the directories correspond to each user address. And we do auto-learn, with opportunistic expiry.Good thought about --round-robin, I am willing to use a little more memory if it saves on CPU."Ring, John C" [EMAIL PROTECTED] wrote: From: Sammy Anderson [mailto:[EMAIL PROTECTED] We recently migrated our SpamAssassin installation from a physical 3.6GHz systemrunning RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 asthe guest OSand SA 3.1.7.I just did the same thing last week, except we're using RHEL 3 and ESX2.5.2, and the physical box it used to be on was far less powerful thenyours.Each user has their own Bayes files (Berkeley DB) and these were copiedfrom the old tothe new server. Now whenever an expiry process runs on a user'sdatabase, the CPUspikes, sometimes for a minute or longer.Hmm. We're using ours as a site-wide MTA to be able to reject incomingmails at SMTP time, so no user DBs on the box, but we are running withBayes checking on (Berkeley DB), autolearning off, and manual Bayesfeeding only a few times a day. Because of that, I don't have practicewith a heavy Bayes load, but how certain are you that it's Bayes hittingthe CPU; did you run sa-learn (or spamassassin) with network reportingturned off to see if that makes a difference?I ask because pyzor did keep our CPU at a constant 75% until I turned itoff; now it varies from 25% to 75% over the day, which is a lot moreacceptable :)Another thought, albeit perhaps not directly related, is are you runningspamd with --robin-robin? When I did that, it reduced the CPU load withthe trade-off of using a little more memory, which seems to be thebetter trade-off, especially for a VM on ESX.-- John C. Ring, Jr. [EMAIL PROTECTED] Network EngineerUnion Switch Signal Inc."If men were angels, no government would be necessary. If angels were togovern men, neither external nor internal controls on government wouldbe necessary." -- James Madison Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail.
Re: High CPU running SA in a VMware VM
Sorry about top-posting, but I just catched the topic, and found it a bit interesting... I run my SMTP server entirely in a VMware VM, and have *never* seen a high CPU usage on that particular machine. I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins. Bayes and quarantine are all in a MySQL database stored on another VM, no big load there either... At peaks, I have a 2-4% CPU usage and 20-65% memory usage on eash VM, all reported by Virtual Center 1.4. So, naturally I'm curious about why there would be a high CPU load from using SA My guess is that it's something else causing it. -- Anders Norrbring Norrbring Consulting Sammy Anderson skrev: I'm pretty sure it is that, because when I turn of bayes altogether, the spikes go away. I also ran sa-learn --force-expire and it PEGS the VM. With bayes debugging enabled, I see lines like this in my syslog: bayes: expired old bayes database entries in 236 seconds: 152268 entries kept, 9457 deleted We have about 140 users, each with a 5 MB bayes_toks file, so there is a need to expire somebody all throughout the day. Each user is virtual, they don't really have an account on the box, but the directories correspond to each user address. And we do auto-learn, with opportunistic expiry. Good thought about --round-robin, I am willing to use a little more memory if it saves on CPU. */Ring, John C [EMAIL PROTECTED]/* wrote: From: Sammy Anderson [mailto:[EMAIL PROTECTED] We recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. I just did the same thing last week, except we're using RHEL 3 and ESX 2.5.2, and the physical box it used to be on was far less powerful then yours. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. Hmm. We're using ours as a site-wide MTA to be able to reject incoming mails at SMTP time, so no user DBs on the box, but we are running with Bayes checking on (Berkeley DB), autolearning off, and manual Bayes feeding only a few times a day. Because of that, I don't have practice with a heavy Bayes load, but how certain are you that it's Bayes hitting the CPU; did you run sa-learn (or spamassassin) with network reporting turned off to see if that makes a difference? I ask because pyzor did keep our CPU at a constant 75% until I turned it off; now it varies from 25% to 75% over the day, which is a lot more acceptable :) Another thought, albeit perhaps not directly related, is are you running spamd with --robin-robin? When I did that, it reduced the CPU load with the trade-off of using a little more memory, which seems to be the better trade-off, especially for a VM on ESX. -- John C. Ring, Jr. [EMAIL PROTECTED] Network Engineer Union Switch Signal Inc. If men were angels, no government would be necessary. If angels were to govern men, neither external nor internal controls on government would be necessary. -- James Madison Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail. http://us.rd.yahoo.com/evt=42297/*http://advision.webevents.yahoo.com/mailbeta smime.p7s Description: S/MIME Cryptographic Signature
URIXBL?
Hello all, I've been diddling with some tests and wondered why there is a spamhaus URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL). I can create this myself easy enough, but wondered if there was a reason XBL is not included. Thanks. -Jeff
Re: mcafee-spamassassin-rules
On Fri, Oct 27, 2006 at 01:38:32PM -0400, Chris Santerre wrote: It's also worth noting that hypothetically, if I was a company releasing updates based on an open-source product, I may have incentive to avoid making those updates useful on said product, otherwise people would download my updates and not pay me for the software. Wouldn't that be against the open source lic? Not that I'm aware of, why would it be? If I produce something on my own (like new rules) and publish it, I'm not bound by someone else's licensing. In this case, if I'm following the code license and make modifications such that new rules that I produce are in a proprietary format, then that's perfectly valid. With SA 3, I could even make the config parsing a plugin and not have to modify any of the base code. -- Randomly Selected Tagline: I came here to kick butt and chew gum, and I'm all out of gum. - They Live (movie) pgpq3zHGcsyJy.pgp Description: PGP signature
Re: URIXBL?
Jeff Hardy writes: Hello all, I've been diddling with some tests and wondered why there is a spamhaus URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL). I can create this myself easy enough, but wondered if there was a reason XBL is not included. Thanks. Basically, it didn't work well ;) Try it out -- it doesn't correlate well with spam. --j.
Re: URIXBL?
Jeff Hardy wrote: Hello all, I've been diddling with some tests and wondered why there is a spamhaus URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL). I can create this myself easy enough, but wondered if there was a reason XBL is not included. Thanks. XBL is mostly infected PCs. These systems are used to send spam but not generally to host spam domains.
Re: URIXBL?
On Fri, 2006-10-27 at 20:38 +0100, Justin Mason wrote: Jeff Hardy writes: Hello all, I've been diddling with some tests and wondered why there is a spamhaus URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL). I can create this myself easy enough, but wondered if there was a reason XBL is not included. Thanks. Basically, it didn't work well ;) Try it out -- it doesn't correlate well with spam. --j. Fair enough I'll test away. BTW, for anyone else coming across this post: warn: config: error: rule 'URIBL_SBL-XBL' has invalid characters (not Alphanumeric + Underscore + starting with a non-digit) Have to get rid of that hyphen. Thank you 'spamassassin -D all ...' :) Thanks for the reply. -Jeff
Re: MailScanner versus Amavisd-new with postfix
Jeff Chan wrote: Not to start any flamewars, but does anyone have strong opinions on MailScanner versus Amavisd-new for use with postfix (and of course SpamAssassin and ClamAV)? In the old days it seemed Amavisd-new may have integrated better with postfix, but is that no longer the case? Some folks say MailScanner is faster and leaner. What gives? Jeff C. Jeff can't say I've compared the two, but I run MailScanner and it does have a couple of neat features recently - it's own MD5 cache of recent spam which speeds things up alot, and the inbuilt phishing testing (yeah ok this has been in a while). it also glues SA, 12 anti-virus engines, and it's own tests (like executables which has saved me a few times before the av people have updates). horses for courses, but it's nice to have a choice of amavis-new OR MailScanner. -- Martin Hepworth Senior Systems Administrator Solid State Logic Tel: +44 (0)1865 842300 ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote confirms that this email message has been swept for the presence of computer viruses and is believed to be clean. **
RE: MailScanner versus Amavisd-new with postfix
-Original Message- From: Jeff Chan [mailto:[EMAIL PROTECTED] Sent: Friday, October 27, 2006 9:54 AM To: SpamAssassin Users Subject: MailScanner versus Amavisd-new with postfix Not to start any flamewars, but does anyone have strong opinions on MailScanner versus Amavisd-new for use with postfix (and of course SpamAssassin and ClamAV)? In the old days it seemed Amavisd-new may have integrated better with postfix, but is that no longer the case? Some folks say MailScanner is faster and leaner. What gives? Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/ Wietse Venema says that MailScanner uses unsupported methods to manipulate the queue that could (and has) lead to lost email. I don't know the full details, but it has been discussed much on the postfix list. My impression is that the condition is rare, but it does happen. Just a heads up. -DH CONFIDENTIALITY NOTICE: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. SPAM-FREE 1.0(2476)
RE: MailScanner versus Amavisd-new with postfix
note: I don't use mailscanner, so am only relaying what I saw on the postfix list. My understanding (based on foggy memory - search the list archives for a better answer) is that MailScanner dipped into postfix queues using either undocumented postfix APIs or by bypassing postfix entirely and directly manipulating files on disk. This led to instances of documented mail loss. Wietse therefore said that it wasn't safe to use. I've also recently read (I believe also on the postfix list, but am not sure) that MailScanner has remedied this behavior, and that it is now safe to use with postfix, but you'll need to confirm for yourself if that is true. Kurt | -Original Message- | From: Jeff Chan [mailto:[EMAIL PROTECTED] | Sent: Friday, October 27, 2006 06:54 | To: SpamAssassin Users | Subject: MailScanner versus Amavisd-new with postfix | | | Not to start any flamewars, but does anyone have strong opinions | on MailScanner versus Amavisd-new for use with postfix (and of | course SpamAssassin and ClamAV)? | | In the old days it seemed Amavisd-new may have integrated better | with postfix, but is that no longer the case? Some folks say | MailScanner is faster and leaner. | | What gives? | | Jeff C. | -- | Jeff Chan | mailto:[EMAIL PROTECTED] | http://www.surbl.org/ |
RE: High CPU running SA in a VMware VM
-Original Message- From: Anders Norrbring [mailto:[EMAIL PROTECTED] Sent: vrijdag 27 oktober 2006 20:58 To: users@spamassassin.apache.org Subject: Re: High CPU running SA in a VMware VM I run my SMTP server entirely in a VMware VM, and have *never* seen a high CPU usage on that particular machine. I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins. Bayes and quarantine are all in a MySQL database stored on another VM, no big load there either... I concur. I've been using Vmware, as a shadow/test server, for the production FreeBSD one, for years; never had any such issue. Vmware rocks! :) I would run any of the db_dump or db_upgrade utils for BerkeleyDB; or reinstall DB_File (and make darn sure it's compiled against the correct BerkeleyDB libs). At any rate, I myself would probably be more inclined to look into a BerkeleyDB issue than a Vmware one. - Mark
Re: ImageInfo vs FuzzyOCR performance?
Jeff Chan wrote: Does anyone have any recent feedback about the performance of ImageInfo versus FuzzyOCR about detecting stock image spams (or any others)? Does FuzzyOCR catch significantly more spams than ImageInfo? Cheers, Jeff C. I maybe biased, as I help in FuzzyOcr development, but do use both. ImageInfo is fine and will get you part of the way there, but FuzzyOcr hits more often. Daily scanning ~8Kmsg/day, FuzzyOcr hits ~1600 times and ImageInfo hits 150 times on average. On my system, here are the top10 rule hits from yesterday: SPAM Results: 3936 Message(s) 49.83% 19.399 Average Score 3343 Time(s)7.50% 84.93% Hit Rule: BAYES_99 3068 Time(s)6.88% 77.95% Hit Rule: HTML_MESSAGE 1655 Time(s)3.71% 42.05% Hit Rule: FUZZY_OCR 1527 Time(s)3.42% 38.80% Hit Rule: SARE_GIF_ATTACH 1411 Time(s)3.16% 35.85% Hit Rule: URIBL_BLACK 1274 Time(s)2.86% 32.37% Hit Rule: URIBL_BLACK_OVERLAP 1271 Time(s)2.85% 32.29% Hit Rule: MIME_HTML_ONLY 1215 Time(s)2.72% 30.87% Hit Rule: URIBL_JP_SURBL 1187 Time(s)2.66% 30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET 1184 Time(s)2.66% 30.08% Hit Rule: SARE_GIF_STOX Jorge Valdes
Re: domainkeys unverified - solved
Chris Purves wrote: I just got the domainkeys plugin set up, but it's not working the way I expect. In messages from Yahoo I see: 0.0 DK_SIGNED Domain Keys: message has an unverified signature but I never see DK_VERIFIED Is there something I need to configure? I didn't apply the patch, because I'm assuming it's been incorporated into 3.1.4. In the end, with the help of Mark Martinec, I was able to determine that the problem was with my ISP provided DNS namerservers not allowing full TXT records to be returned (they were truncated). I installed bind9 and used localhost as my primary nameserver and now I can get DK_VERIFIED. Symptoms for this problem were: DK_VERIFIED does not fire for Yahoo! e-mails (multiple part TXT record) DK_VERIFIED does fire for Gmail e-mail (single part TXT record) Perl modules Mail::DomainKeys and Mail::DKIM will fail during make test -- Chris
Re: Scoring base64 blob messages
Theo Van Dinter wrote: On Thu, Oct 26, 2006 at 12:19:23PM -0400, Peter H. Lemieux wrote: No, because there are going to be a lot of mails that would hit that. Really? Maybe it's because I live in the US, but I can't think of a legitimate message I've ever received consisting only of a base64 blob. You look at a lot of raw messages? ;) Doesn't everybody? Seriously, I do look at a lot of raw messages; for instance, I review the full text of nearly every spam message that doesn't get caught by my filters and shows up in my inbox. Obviously I don't get much mail from Blackberry users or Ticketmaster! Rather than making anyone else do the work for me, is there something I can read about how to determine the frequency of different message features appearing in the corpus? Well, there isn't a SA corpus, so there's no answer to that question. Ah, I hadn't read this page before: http://wiki.apache.org/spamassassin/HandClassifiedCorpora My recollection was that 2.x used a centrally-defined corpus rather than a variety of developers' corpora (see, I read the wiki). Either things changed with the switch in scoring algorithms in 3.x, or my recollection is shoddy. Probably the latter. You can generate some rules and use mass-check to run against your own corpus to gather some statistics. I'm willing to run some rules for you against my corpus if you want. I just don't have time to come up with the rules right now. Thanks for the offer, Theo, but don't spend your valuable time on this. I'll give it shot some day when I've got some spare moments. If I do get some candidate rules, I'll pass them along to you for testing. Thanks again! Peter
Re: Scoring base64 blob messages
On Fri, Oct 27, 2006 at 05:24:58PM -0400, Peter H. Lemieux wrote: Well, there isn't a SA corpus, so there's no answer to that question. Ah, I hadn't read this page before: http://wiki.apache.org/spamassassin/HandClassifiedCorpora My recollection was that 2.x used a centrally-defined corpus rather than a variety of developers' corpora (see, I read the wiki). Either things changed with the switch in scoring algorithms in 3.x, or my recollection is shoddy. Probably the latter. Yeah, sorry. We've had separate corpora since I started with SA several years ago. There was a public corpus of mail made available which could be confusing your memory. :) -- Randomly Selected Tagline: I pity the shul that won't let Krusty in now. Spin me clown! - Mr. T, The Simpsons, Today, I Am a Klown pgp927l5OrmB0.pgp Description: PGP signature
Re: domainkeys unverified - solved
Chris Purves wrote: In the end, with the help of Mark Martinec, I was able to determine that the problem was with my ISP provided DNS namerservers not allowing full TXT records to be returned (they were truncated). Was this something that the ISP cooked up, or was it intrinsic to the DNS server software they are using? If the latter, it would be good to know which server they were running. It might be a useful addition to the FAQ/wiki. Peter
Re: High CPU running SA in a VMware VM
On Fri, Oct 27, 2006 at 09:10:28PM +, Mark wrote: I run my SMTP server entirely in a VMware VM, and have *never* seen a high CPU usage on that particular machine. I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins. I would run any of the db_dump or db_upgrade utils for BerkeleyDB; or reinstall DB_File (and make darn sure it's compiled against the correct BerkeleyDB libs). At any rate, I myself would probably be more inclined to look into a BerkeleyDB issue than a Vmware one. Yeah, I doubt there's an issue with VMware specifically (ESX++). My guess is that if you're seeing different behavior between a physical host and virtual host, there's something different in the virtual host -- different OS, libs, perl modules, etc. Obviously that won't be the case if you virtualized a physical machine, but I seem to recall from the start of the thread that you migrated the data but not the OS. -- Randomly Selected Tagline: My wife and I were happy for years. Then we met. pgpofuBWMG1My.pgp Description: PGP signature
Re: High CPU running SA in a VMware VM
You are correct, this was a new build, with a later version of SA and migrated Bayes files. It could very well be the case that Berkeley DB needs to be patched, or the data converted in some fashion.I will say that in a VM environment, we tried to build gcc, and it took MUCH longer than on a physical box with the same processors. VMware analyzed our data, and they determined that we should disable NPTL and use LinuxThreads instead (kb 1470). This did help substantially, and though slower than the physical machine, it was acceptable. I have tried this for SA, and it does seem to cut down the CPU required, so there is some hope.Theo Van Dinter [EMAIL PROTECTED] wrote: On Fri, Oct 27, 2006 at 09:10:28PM +, Mark wrote: I run my SMTP server entirely in a VMware VM, and have *never* seen a high CPU usage on that particular machine. I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins. I would run any of the "db_dump" or db_upgrade" utils for BerkeleyDB; or reinstall DB_File (and make darn sure it's compiled against the correct BerkeleyDB libs). At any rate, I myself would probably be more inclined to look into a BerkeleyDB issue than a Vmware one.Yeah, I doubt there's an issue with VMware specifically (ESX++). My guess isthat if you're seeing different behavior between a physical host and virtualhost, there's something different in the virtual host -- different OS, libs,perl modules, etc.Obviously that won't be the case if you virtualized a physical machine, but Iseem to recall from the start of the thread that you migrated the data but notthe OS.-- Randomly Selected Tagline:My wife and I were happy for years. Then we met. All-new Yahoo! Mail - Fire up a more powerful email and get things done faster.
Re: domainkeys unverified - solved
Peter H. Lemieux writes: Chris Purves wrote: In the end, with the help of Mark Martinec, I was able to determine that the problem was with my ISP provided DNS namerservers not allowing full TXT records to be returned (they were truncated). Was this something that the ISP cooked up, or was it intrinsic to the DNS server software they are using? If the latter, it would be good to know which server they were running. It might be a useful addition to the FAQ/wiki. yes, definitely -- this is worth knowing about... --j.
Re: High CPU running SA in a VMware VM
I manually ran sa-learn --force-expire, and it hammered the box. Here is a debug and timing information (for just a 5 MB file!):[18002] dbg: bayes: tie-ing to DB file R/O /home/ian/.spamassassin/bayes_toks [18002] dbg: bayes: tie-ing to DB file R/O /home/ian/.spamassassin/bayes_seen [18002] dbg: bayes: found bayes db version 3 [18002] dbg: bayes: DB journal sync: last sync: 1161899721 [18002] dbg: bayes: opportunistic call found journal sync due [18002] dbg: bayes: bayes journal sync starting [18002] dbg: bayes: tie-ing to DB file R/W /home/ian/.spamassassin/bayes_toks [18002] dbg: bayes: tie-ing to DB file R/W /home/ian/.spamassassin/bayes_seen [18002] dbg: bayes: found bayes db version 3 [18002] dbg: bayes: synced databases from journal in 0 seconds: 792 unique entries (974 total entries) [18002] dbg: bayes: bayes journal sync completed [18002] dbg: bayes: bayes journal sync starting [18002] dbg: bayes: bayes journal sync completed [18002] dbg: bayes: expiry starting [18002] dbg: bayes: expiry check keep size, 0.75 * max: 112500 [18002] dbg: bayes: token count: 161725, final goal reduction size: 49225 [18002] dbg: bayes: first pass? current: 1161986180, Last: 1161862273, atime: 691200, count: 10015, newdelta: 140627, ratio: 4.91512730903645, period: 43200 [18002] dbg: bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass) [18002] dbg: bayes: expiry max exponent: 9 -- about 20 seconds elapsed [18002] dbg: bayes: atime token reduction [18002] dbg: bayes: === [18002] dbg: bayes: 43200 144256 [18002] dbg: bayes: 86400 133029 [18002] dbg: bayes: 172800 111350 [18002] dbg: bayes: 345600 72306 [18002] dbg: bayes: 691200 9457 [18002] dbg: bayes: 1382400 0 [18002] dbg: bayes: 2764800 0 [18002] dbg: bayes: 5529600 0 [18002] dbg: bayes: 11059200 0 [18002] dbg: bayes: 22118400 0 [18002] dbg: bayes: first pass decided on 691200 for atime delta -- about 40 seconds elapsed [a sort going on here???] [18002] dbg: bayes: untie-ing [18002] dbg: bayes: untie-ing db_toks [18002] dbg: bayes: untie-ing db_seen [18002] dbg: bayes: files locked, now unlocking lock expired old bayes database entries in 60 seconds = YIKES 152268 entries kept, 9457 deleted token frequency: 1-occurrence tokens: 68.79% token frequency: less than 8 occurrences: 18.63% [18002] dbg: bayes: expiry completed . real 1m6.157s user 0m56.044s = WOW! sys 0m2.370sAnders Norrbring [EMAIL PROTECTED] wrote: Sorry about top-posting, but I just catched the topic, and found it a bit interesting...I run my SMTP server entirely in a VMware VM, and have *never* seen a high CPU usage on that particular machine. I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins.Bayes and quarantine are all in a MySQL database stored on another VM, no big load there either...At peaks, I have a 2-4% CPU usage and 20-65% memory usage on eash VM, all reported by Virtual Center 1.4.So, naturally I'm curious about why there would be a high CPU load from using SA My guess is that it's something else causing it.-- Anders NorrbringNorrbring ConsultingSammy Anderson skrev: I'm pretty sure it is that, because when I turn of bayes altogether, the spikes go away. I also ran sa-learn --force-expire and it PEGS the VM. With bayes debugging enabled, I see lines like this in my syslog: bayes: expired old bayes database entries in 236 seconds: 152268 entries kept, 9457 deleted We have about 140 users, each with a 5 MB bayes_toks file, so there is a need to expire somebody all throughout the day. Each user is virtual, they don't really have an account on the box, but the directories correspond to each user address. And we do auto-learn, with opportunistic expiry. Good thought about --round-robin, I am willing to use a little more memory if it saves on CPU. */"Ring, John C" /* wrote: From: Sammy Anderson [mailto:[EMAIL PROTECTED]We recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. I just did the same thing last week, except we're using RHEL 3 and ESX 2.5.2, and the physical box it used to be on was far less powerful then yours. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. Hmm. We're using ours as a site-wide MTA to be able to reject incoming mails at SMTP time, so no user DBs on the box, but we are running with Bayes checking on (Berkeley DB), autolearning off, and manual Bayes feeding only a few times a day. Because of that, I don't have practice with a heavy Bayes load, but how certain are you that it's Bayes hitting the CPU; did you
RE: domainkeys unverified - solved
-Original Message- From: Chris Purves [mailto:[EMAIL PROTECTED] Sent: vrijdag 27 oktober 2006 23:20 To: users@spamassassin.apache.org Subject: Re: domainkeys unverified - solved In the end, with the help of Mark Martinec, I was able to determine that the problem was with my ISP provided DNS namerservers not allowing full TXT records to be returned (they were truncated). Symptoms for this problem were: DK_VERIFIED does not fire for Yahoo! e-mails (multiple part TXT record) Interesting. nslookup -q=txt lima._domainkey.yahoogroups.com k=rsa; p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCTpDT1pbK0xwkd ZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W LLNOf9hkMeicUH48TGkEoCAcaSjJz/b3NMrOy9l1U7gQIDAP// I get two parts, too. Is that their correct public key, when concatinated? Though I do not get both parts in random order, I wonder if I would not have the same issue, then. - Mark
Re: High CPU running SA in a VMware VM
On Fri, Oct 27, 2006 at 03:01:45PM -0700, Sammy Anderson wrote: I manually ran sa-learn --force-expire, and it hammered the box. Here is a debug and timing information (for just a 5 MB file!): [18002] dbg: bayes: token count: 161725, final goal reduction size: 49225 want to get rid of (max) 49225 tokens [18002] dbg: bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass) have to do step 1 and can't estimate [18002] dbg: bayes: expiry max exponent: 9 -- about 20 seconds elapsed it's going through every token in your db [18002] dbg: bayes: atime token reduction [18002] dbg: bayes: === [18002] dbg: bayes: 43200 144256 [18002] dbg: bayes: 86400 133029 [18002] dbg: bayes: 172800 111350 [18002] dbg: bayes: 345600 72306 [18002] dbg: bayes: 691200 9457 [18002] dbg: bayes: 1382400 0 [...] [18002] dbg: bayes: first pass decided on 691200 for atime delta 691200 wins the Price Is Right (9457 is the closest without going over) -- about 40 seconds elapsed [a sort going on here???] It's creating a new DB file, going back through every token in the original DB, and for any that are newer than 9457 seconds ago, it copies the entry to the new DB. expired old bayes database entries in 60 seconds = YIKES yep. expiry is relatively resource intensive and slow w/ DBMs, but there's no other good way to do it (or at least, no one has suggested a really better way to do it...) -- Randomly Selected Tagline: I believe it's not butter, I just can't believe it's $1.59! pgpFcu5EsuOzk.pgp Description: PGP signature
Re: Rules to reject bounce messages for mail not sent by me
On Oct 27, 2006, at 3:58 AM, Justin Mason wrote: Nick Gilbert writes: PS. Will setting up SPF on my domain name have any effect for things like this? Will it discourage spammers from using my domain or reduce the number of bounce messages I/we get? nope. they don't bother checking, and the systems sending bounces aren't the ones that are being kept up-to-date enough to check SPF either. Umm... not in my experience. Every time we turn on SPF for a domain, the amount of backscatter goes to about a third of the previous amount. Every time I've been involved anyway. -- Jo Rhett Senior Network Engineer Network Consonance
Re: domainkeys unverified - solved
Mark wrote: -Original Message- From: Chris Purves [mailto:[EMAIL PROTECTED] Sent: vrijdag 27 oktober 2006 23:20 To: users@spamassassin.apache.org Subject: Re: domainkeys unverified - solved In the end, with the help of Mark Martinec, I was able to determine that the problem was with my ISP provided DNS namerservers not allowing full TXT records to be returned (they were truncated). Symptoms for this problem were: DK_VERIFIED does not fire for Yahoo! e-mails (multiple part TXT record) Interesting. nslookup -q=txt lima._domainkey.yahoogroups.com k=rsa; p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCTpDT1pbK0xwkd ZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W LLNOf9hkMeicUH48TGkEoCAcaSjJz/b3NMrOy9l1U7gQIDAP// I get two parts, too. Is that their correct public key, when concatinated? Though I do not get both parts in random order, I wonder if I would not have the same issue, then. What you get is correct. In my case, when it's not working I get: [EMAIL PROTECTED]:~$ nslookup -q=txt lima._domainkey.yahoogroups.com Server: 64.59.184.13 Address:64.59.184.13#53 Non-authoritative answer: lima._domainkey.yahoogroups.com text = k=rsa\; p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCTpDT1pbK0xwkdZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W Authoritative answers can be found from: [EMAIL PROTECTED]:~$ I'm missing the second part of the Answer and Authority is empty. Using dig -t txt ... the Additional section is also emtpy. -- Chris
Re: domainkeys unverified - solved
Peter H. Lemieux wrote: Chris Purves wrote: In the end, with the help of Mark Martinec, I was able to determine that the problem was with my ISP provided DNS namerservers not allowing full TXT records to be returned (they were truncated). Was this something that the ISP cooked up, or was it intrinsic to the DNS server software they are using? If the latter, it would be good to know which server they were running. It might be a useful addition to the FAQ/wiki. I still have to contact them, but I'll post back with my results. -- Chris
Re: High CPU running SA in a VMware VM
And there is one of these for each user, this is just for one user. Sounds like we may have to abandon Bayes or possibly use mysql. Not sure we are ready to invest in setting that all up...Theo Van Dinter [EMAIL PROTECTED] wrote: On Fri, Oct 27, 2006 at 03:01:45PM -0700, Sammy Anderson wrote: I manually ran sa-learn --force-expire, and it hammered the box. Here is a debug and timing information (for just a 5 MB file!): [18002] dbg: bayes: token count: 161725, final goal reduction size: 49225want to get rid of (max) 49225 tokens [18002] dbg: bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass)have to do step 1 and can't estimate [18002] dbg: bayes: expiry max exponent: 9 -- about 20 seconds elapsedit's going through every token in your db [18002] dbg: bayes: atime token reduction [18002] dbg: bayes: === [18002] dbg: bayes: 43200 144256 [18002] dbg: bayes: 86400 133029 [18002] dbg: bayes: 172800 111350 [18002] dbg: bayes: 345600 72306 [18002] dbg: bayes: 691200 9457 [18002] dbg: bayes: 1382400 0[...] [18002] dbg: bayes: first pass decided on 691200 for atime delta691200 wins the Price Is Right (9457 is the closest without going over) -- about 40 seconds elapsed [a sort going on here???]It's creating a new DB file, going back through every token in the originalDB, and for any that are newer than 9457 seconds ago, it copies the entry tothe new DB. expired old bayes database entries in 60 seconds = YIKESyep. expiry is relatively resource intensive and slow w/ DBMs, butthere's no other good way to do it (or at least, no one has suggesteda really better way to do it...)-- Randomly Selected Tagline:I believe it's not butter, I just can't believe it's $1.59! Get your email and see which of your friends are online - Right on the new Yahoo.com
RE: domainkeys unverified - solved
-Original Message- From: Chris Purves [mailto:[EMAIL PROTECTED] Sent: zaterdag 28 oktober 2006 0:49 To: users@spamassassin.apache.org Subject: Re: domainkeys unverified - solved DK_VERIFIED does not fire for Yahoo! e-mails (multiple part TXT record) Interesting. nslookup -q=txt lima._domainkey.yahoogroups.com k=rsa; p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCT pDT1pbK0xwkd ZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W LLNOf9hkMeicUH48TGkEoCAcaSjJz/b3NMrOy9l1U7gQIDAP// I get two parts, too. Is that their correct public key, when concatinated? What you get is correct. In my case, when it's not working I get: [EMAIL PROTECTED]:~$ nslookup -q=txt lima._domainkey.yahoogroups.com Server: 64.59.184.13 Address:64.59.184.13#53 Non-authoritative answer: lima._domainkey.yahoogroups.com text = k=rsa\; p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCT pDT1pbK0xwkdZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W Authoritative answers can be found from: [EMAIL PROTECTED]:~$ I'm missing the second part of the Answer and Authority is empty. Thanks. :) I was getting worried. I'm not quite ready to go to BIND 9 yet (don't all y'all shoot me now), so I'm happy to hear it's working. - Mark
Re: High CPU running SA in a VMware VM
Sammy Anderson wrote: And there is one of these for each user, this is just for one user. Sounds like we may have to abandon Bayes or possibly use mysql. Not sure we are ready to invest in setting that all up... Bayes in MySQL is a snap to setup and it really runs rings around the dbm setup in a real world situation. I switched over two clients this morning and neither of them had MySQL installed. Installed from source (php 5 requirements etc) and still had both installs done before lunch. Regards, Rick
RE: ImageInfo vs FuzzyOCR performance?
-Original Message- From: Jorge Valdes [mailto:[EMAIL PROTECTED] Sent: Friday, October 27, 2006 5:12 PM To: users@spamassassin.apache.org Subject: Re: ImageInfo vs FuzzyOCR performance? SPAM Results: 3936 Message(s) 49.83% 19.399 Average Score 3343 Time(s)7.50% 84.93% Hit Rule: BAYES_99 3068 Time(s)6.88% 77.95% Hit Rule: HTML_MESSAGE 1655 Time(s)3.71% 42.05% Hit Rule: FUZZY_OCR 1527 Time(s)3.42% 38.80% Hit Rule: SARE_GIF_ATTACH 1411 Time(s)3.16% 35.85% Hit Rule: URIBL_BLACK 1274 Time(s)2.86% 32.37% Hit Rule: URIBL_BLACK_OVERLAP 1271 Time(s)2.85% 32.29% Hit Rule: MIME_HTML_ONLY 1215 Time(s)2.72% 30.87% Hit Rule: URIBL_JP_SURBL 1187 Time(s)2.66% 30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET 1184 Time(s)2.66% 30.08% Hit Rule: SARE_GIF_STOX What do you use to get those stats?
RE: ImageInfo vs FuzzyOCR performance?
Jeff Chan wrote: Does anyone have any recent feedback about the performance of ImageInfo versus FuzzyOCR about detecting stock image spams (or any others)? Does FuzzyOCR catch significantly more spams than ImageInfo? But one of the things that ImageInfo does to avoid FPs is assign a higher score to image-only spam where the ratio of screen-space/amount-of-text is high. But notice how more of this type of spam lately has more gibberish text at the bottom lately? This messes that formula up and creates a VERY small ImageInfo score. I know that the spammers might have been doing this to get around bayes... but I suspect that they were really trying to get around ImageInfo because this change-up seemed to happen soon after ImageInfo was introduced. Nevertheless, I've found that manually readjusting those ratios has helped to catch more spam. (And I'm reluctant to mention this in the first place because if they are adjusted at the SARE site, then the spammers will only readjust accordingly!) Rob McEwen PowerView Systems
SA TIMED OUT
I upgraded to SA 3.1.4 last night and now I have two issues that I'm trying to resolve: (1) spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 (2) In the logs I'm seeing a good number of the following type of entry: Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, backtrace: at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line 363\n\teval {...} called at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line 363\n\tMail::SpamAssassin::DnsResolver::poll_responses('Mail::SpamAssassin::DnsResolver=HASH(0x4005820)', 72) called at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 710\n\tMail::SpamAssassin::Plugin::URIDNSBL::complete_lookups('Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x1ff8200)', 'HASH(0x4cbdad0)', 72) called at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 412\n\tMail::SpamAssassin::Plugin::URIDNSBL::check_post_dnsbl('Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x1ff8200)', 'HASH(0x6816dd0)') called at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/PluginHandler.pm line 159\n\teval {...} called at /usr/lib/perl5/vendor_perl/5.8.8/Mail/Sp... I've checked the archives and maybe I missed something, but I wasn't able to find anything that seemed relavent. Thanks for any pointers. Mike [EMAIL PROTECTED] ~]# spamassassin -V SpamAssassin version 3.1.4 running on Perl version 5.8.8 -- Let the machine do the dirty work. - Elements of Programming Style 15:35:01 up 16:21, 0 users, load average: 0.32, 0.31, 0.28 Linux Registered User #241685 http://counter.li.org
Re: MailScanner versus Amavisd-new with postfix
Jeff, Not to start any flamewars, but does anyone have strong opinions on MailScanner versus Amavisd-new for use with postfix (and of course SpamAssassin and ClamAV)? Of course I'm biased, but I'd be worried running program with about 400 cases of calling system routines (I/O, file system, etc.) without checking resulting status or failing to report errors. MailScanner works while everything is in order. When unexpected happens (e.g. disk full, I/O or file system errors, depleted system resources), then unpredictable things are bound to result, and possibly go by unnoticed for some time or prove difficult to diagnose. Mark
Re: spamassassin --lint fails with rules in local.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 26.10.2006 14:35, * Dylan Bouterse wrote: I have added some rules in my local.cf file (for adding scores for some SARE rules) but when I run spamassassin -lint (or when I run rules_du_jour which does the same) it says the rules in my local.cf file are non-existent, but spamassassin ultimately runs fine. What am I doing wrong? Dylan Oops, just stumbled upon the release announcemnet of SpamAssassin 3.1.7 http://www.nabble.com/ANNOUNCE%3A-Apache-SpamAssassin-3.1.7-available%21-tf2415849.html 3.1.7 is a quick-fix release; it contains only a fix for one bug, introduced accidentally in 3.1.6: - - bug 5119: if admins had set rule scores in the site configuration in /etc, sa-update would fail. Back out this change Don't know if Dylan is already using 3.1.7. We are on 3.1.6 because there is no updated FreeBSD-Port out yet. So I wait. Greetings Alain -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFQqajV5MZZmyxvGgRAncZAJwIvkSSCc3KX0jaMXxmVlQ3cYqZmgCgjFzS ZXC3XFWGXreL8fc/c2lhoUg= =aE61 -END PGP SIGNATURE-
Re: SA TIMED OUT
M. Lewis wrote: I upgraded to SA 3.1.4 last night and now I have two issues that I'm trying to resolve: (1) spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 If you've not edited /etc/mail/spamassassin/v310.pre to load the dcc plugin, dcc is disabled by default (it's not free for everyone to use, so disabled pending your decision that your use falls under DCC's license.. most folks do, but check the license. Without any DCC support loaded, the dcc_timeout option is meaningless to SA. (2) In the logs I'm seeing a good number of the following type of entry: Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, backtrace: at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line 363\n\teval {...} called at Sounds like your DNS is slow, and you've got a short $sa_timeout in your amavis configs. But I'm no amavis expert.
RE: SA TIMED OUT
I upgraded to SA 3.1.4 last night and now I have two issues that I'm trying to resolve: (1) spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 You need to enable (uncomment) the DCC plugin in v310.pre (2) In the logs I'm seeing a good number of the following type of entry: Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, backtrace: at ... I've checked the archives and maybe I missed something, but I wasn't able to find anything that seemed relavent. Thanks for any pointers. Mike The newer version takes longer to scan (quite noticable on a low powered system). Newer versions of amavisd-new allow scans to take longer without timomg out where older versions have a default of $sa_timeout = 30; which should be included in amavisd.conf and raised to something like 60 seconds. I also suggest moving Bayes to SQL, and if not, then set lock_method = flock in local.cf if appropriate. http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html#miscellaneous_options _ Try Search Survival Kits: Fix up your home and better handle your cash with Live Search! http://imagine-windowslive.com/search/kits/default.aspx?kit=improvelocale=en-USsource=hmtagline
RE: SA TIMED OUT
spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 BTW, as Matt says, your DNS may be slow. If DCC doesn't respond within 10 seconds, I would imagine it's unlikely it will respond - so I wouldn't waste time waiting around another 8 seconds. Many people find a local caching DNS server really helps on net tests. Gary V _ Stay in touch with old friends and meet new ones with Windows Live Spaces http://clk.atdmt.com/MSN/go/msnnkwsp007001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=createwx_url=/friends.aspxmkt=en-us
Re: SA TIMED OUT
Matt Kettler wrote: M. Lewis wrote: I upgraded to SA 3.1.4 last night and now I have two issues that I'm trying to resolve: (1) spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 If you've not edited /etc/mail/spamassassin/v310.pre to load the dcc plugin, dcc is disabled by default (it's not free for everyone to use, so disabled pending your decision that your use falls under DCC's license.. most folks do, but check the license. Without any DCC support loaded, the dcc_timeout option is meaningless to SA. This was indeed the problem. Error gone now. (2) In the logs I'm seeing a good number of the following type of entry: Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, backtrace: at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line 363\n\teval {...} called at Sounds like your DNS is slow, and you've got a short $sa_timeout in your amavis configs. But I'm no amavis expert. Actually I rebuilt this machine last night and forgot to turn on the cacheing NS. That made a difference! Thanks Matt! -- May the bugs of many programs nest on your hard drive. 22:45:01 up 3:13, 0 users, load average: 0.10, 0.17, 0.17 Linux Registered User #241685 http://counter.li.org
Re: SA TIMED OUT
Gary V wrote: I upgraded to SA 3.1.4 last night and now I have two issues that I'm trying to resolve: (1) spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 You need to enable (uncomment) the DCC plugin in v310.pre Done and the error is gone now. (2) In the logs I'm seeing a good number of the following type of entry: Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, backtrace: at ... I've checked the archives and maybe I missed something, but I wasn't able to find anything that seemed relavent. Thanks for any pointers. Mike The newer version takes longer to scan (quite noticable on a low powered system). Newer versions of amavisd-new allow scans to take longer without timomg out where older versions have a default of $sa_timeout = 30; which should be included in amavisd.conf and raised to something like 60 seconds. I also suggest moving Bayes to SQL, and if not, then set lock_method = flock in local.cf if appropriate. http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html#miscellaneous_options Thanks Gary for the explanation. I will check into all of these. Thanks, Mike _ Try Search Survival Kits: Fix up your home and better handle your cash with Live Search! http://imagine-windowslive.com/search/kits/default.aspx?kit=improvelocale=en-USsource=hmtagline -- May the bugs of many programs nest on your hard drive. 22:45:01 up 3:13, 0 users, load average: 0.10, 0.17, 0.17 Linux Registered User #241685 http://counter.li.org
Re: SA TIMED OUT
Gary V wrote: spamassassin -D --lint is giving me an error: [2533] warn: config: failed to parse line, skipping: dcc_timeout 18 BTW, as Matt says, your DNS may be slow. If DCC doesn't respond within 10 seconds, I would imagine it's unlikely it will respond - so I wouldn't waste time waiting around another 8 seconds. Many people find a local caching DNS server really helps on net tests. Gary V Yes, I have been using a caching NS prior to rebuilding the machine yesterday. I simply forgot to turn it on this time. Duh. Thanks, Mike -- IBM: Icons Bygones My Mom's 22:50:01 up 3:18, 0 users, load average: 0.53, 0.30, 0.22 Linux Registered User #241685 http://counter.li.org