RE: Ammount of the RAM used by spamd childs
Thanks All! Now I've about 80MB for child Andrea
Re: Earthlink emails
On Fri, 2006-09-29 at 11:20 -0400, Michel Vaillancourt wrote: Ramprasad wrote: On Fri, 2006-09-29 at 08:12 -0400, Michel Vaillancourt wrote: Ramprasad wrote: Why not SPF ?? Over two thirds of the email I receive that is UCE/Spam has an SPF_PASS associated with it from SA. All SPF seems to do is make the stupid spammers look more stupid. The clever ones aren't affected. I have a script that automatically blocks SPF-pass domains sending spam consistently. you could make good use of the SPF_PASS too. Care to share? This would be very handy. This is a perl script a part of larger module. And not exactly worth sharing. But the idea is very simple * cronscript on each machine parses the logs for SPF_PASS mails with SA score above 15 and puts the messages log lines in a file in http area * The rbldns server wgets all files from different servers and finds the top sender domains who send spam * Delete all whitelisted domains from the list and those domains who are also sending a lot of ham to correct ids ( I get this from a mysql db query to my reports db ) * Put the remaining into the rbldns blacklist and restart the rbldns server for postfix to use these What is the point accepting the mail and the entire data and then scanning for DK when It should have ideally been rejected after mail from: That would be the exact point of DK at the Postfix/ MTA level. How. All the while I thought dkfilter helps me block after dataend ? Do I have to RTFM again ? My mistake.. this one runs as a content filter. The same author is working on a DKIM Proxy that would be your first point-of-contact and handle the mail from intercept. I got confused. So I let SA do the testing .. which catches the spams but eats resources of my servers. When you receive 3-5 million mails a day you tend to bother more about resources I would humbly submit to you that if you move that much traffic, you should be able to justify one more MX machine in the pool and implementing DK. We have 8 dual xeons already. for this much traffic. And servers are always loaded with all kinds tests enabled in SA I'm curious... what is the RAM/ MHz spec of your machines? 5M mail/day is 7 mail per second per machine... at a median 8 seconds mail handle time, that is 57 mail in the pipes at any one time... 50Mb for SA or anti-virus per message works to about 3Gb of RAM in use. I can see your concern. However, again, I'd say that even two more machines in the pool would bring that down to ~2GB of RAM in use per machine, and that should give you the cycles and memory to run SPF queries as well as DK filters. 4GB Ram , 3GHz x 2 xeon with HT But I think you too would know mail never comes uniformly at 7/s. There are peak times when my mailservers touch 43k/hour while in the nights they may be sleeping with the rest of us. And at peak times the mail delay starts killing us. ( Thats exactly when I start sending 450 to bad domains ) I do understand the notion your boss might not be willing to put another $5K down to deal with the problem. However, as anyone can attest to, good customer service costs money to provide.
Re: Ammount of the RAM used by spamd childs
Balzi Andrea wrote: Thanks All! Now I've about 80MB for child Andrea You're distinctly NOT welcome. I don't help folks who outright blacklist whole ISP's with millions of legitimate users in order to prevent a portion of spam. Particularly when that ISP is one I'm using. Perhaps now that your spamd's are reasonable, you can ditch some of these absurdly ignorant approaches to spam control: A message (from [EMAIL PROTECTED]) was received at 30 Sep 2006 3:14:30 +. The following addresses had delivery problems: [EMAIL PROTECTED] Permanent Failure: 550-mail_drop_because_comcast.net_is_in_our_blacklist_/_mail_scartata_perche' Delivery last attempted at Sat, 30 Sep 2006 03:14:46 -
Re: Ammount of the RAM used by spamd childs
From: Bowie Bailey [EMAIL PROTECTED] Balzi Andrea wrote: -Original Message- [...] every child it occupies approximately 450MB of RAM. My server is a GNU/Linux Debian 3.1r2 with spamassassin v3.1.5 and Perl v5.8.4 Aren't it too many every 450MB for single child? That is a bit excessive. My first guess is that you have WAY too many add-on rule sets (or you are using old ones that should not be used). Which rule sets are you currently using? I'm usign the default rules of spamassassin 3.1.5 with the follow rules downloaded from rulesemporium: ANTIDRUG Antidrug is not needed with current versions of SA. BLACKLIST_URI You should use the ws.surbl.org version of this blacklist instead. See here for more info: http://wiki.apache.org/spamassassin/SURBL BLACKLIST This is a 16M rulefile and probably a major contributor to your memory load. SARE_SPAMCOP_TOP200 The current versions of SA already use this list as a network test. If you have network tests enabled, you don't need this. Other than that, all I can say is that you have quite a few rules. You may want to try removing some of them and restarting spamd. Just do some trial and error and see which ones make the most difference. You named the big ones. I use more rule sets than he quoted and only use about 66 megs. 28211 root 16 0 75368 66m 2400 S 0.0 6.6 0:24.43 spamd {^_^}
Re: Non-blocklisted embedded URLs are getting hits on URIBL_AB_SURBL and URIBL_PH_SURBL in SpamAssassin 3.1.5
David Ulevitch writes: From: Chris [EMAIL PROTECTED] To: users@spamassassin.apache.org Date: Friday, September 29, 2006, 3:59:03 PM Subject: Non-blocklisted embedded URLs are getting hits on URIBL_AB_SURBL and URIBL_PH_SURBL in SpamAssassin 3.1.5 ===8==Original message text=== On Thursday 28 September 2006 1:17 am, Donald Craig wrote: And Theo Van Dinter pointed out: You're not by chance using the opendns.{com,org} folks for DNS, are you? Of course. I'm an idiot. I switched to OpenDNS a couple of weeks back. Time to return from whence I came. Thank you, Donald, We handle DNSBLs but not URIBLs, at the moment. Passing along to Noah to see what he can do. Sorry you had this happen to your SpamAssassin scoring. (Time to check mine... :-) ) You can resolve this behavior by turning off typo correction in your preferences page and it'll work again with us returning NXDOMAIN (RCODE=3) instead of doing the typo correction service. Hopefully we can get more granular with that in the future. If you are on a dynamic IP, well, just sit tight for a couple more weeks or email me to start beta testing some code this week to handle dynamic IPs (and that offer is for anyone). David -- Thanks for commenting, and good to hear it doesn't affect traditional DNSBL lookups. It sounds like we should probably add a temporary SpamAssassin FAQ entry for this? --j. Thanks, David Ulevitch (from OpenDNS) Don Craig I'm getting matches whenever I have an embedded URL on URIBL_AB_SURBL and URIBL_PH_SURBL - unless the URL is actually in URIBL_SBL, in which case the logic for all the flavors of URIBL_XX_SURBL seems to work correctly. I have verified the absence of the incorrectly matching URLs from SURBL with lookups in http://www.rulesemporium.com/cgi-bin/uribl.cgi This is SpamAssassin 3.1.5, all was fine in 3.1.2. For now I have set both those tests to 0.00. Don Craig Yes, OpenDNS definitely caused problems for me also: Sep 1 21:51:25 localhost spamd[10939]: uridnsbl: bogus rr for domain=otwaloow.com, rule=URIBL_XS_SURBL, id=8880 rr=otwaloow.com.xs.surbl.org. 1 IN A 208.67.219.40 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Plugin/ URIDNSBL.pm line 626. Theo pointed out the errors of my ways: The error is saying that it's looking for a 127/8 result, but it gets 208.67.219.40 (which resolves to a *.opendns.com name btw). So I would say that yes, the problems are related to changing your nameservers. -- Chris ===8===End of original message text===
Re: Non-blocklisted embedded URLs are getting hits on URIBL_AB_SURBL and URIBL_PH_SURBL in SpamAssassin 3.1.5
On Sep 30, 2006, at 3:30 AM, Justin Mason wrote: David Ulevitch writes: Donald, We handle DNSBLs but not URIBLs, at the moment. Passing along to Noah to see what he can do. Sorry you had this happen to your SpamAssassin scoring. (Time to check mine... :-) ) You can resolve this behavior by turning off typo correction in your preferences page and it'll work again with us returning NXDOMAIN (RCODE=3) instead of doing the typo correction service. Hopefully we can get more granular with that in the future. If you are on a dynamic IP, well, just sit tight for a couple more weeks or email me to start beta testing some code this week to handle dynamic IPs (and that offer is for anyone). David -- Thanks for commenting, and good to hear it doesn't affect traditional DNSBL lookups. It sounds like we should probably add a temporary SpamAssassin FAQ entry for this? Justin, That sounds like a good idea. Want me to write one up for you in the style of the SA FAQ or is there enough in my post above to toss one in until we are better able to address URIBLs? -david
Re: SA gone mad, times out and stucks
Jürgen Herz wrote: Bowie Bailey wrote: If your --force-expire only took 19 seconds, I would guess that you are not talking to the same database. Make sure you are logged in as the same user that is having the problem when you run the --force-expire. Uh, that's a very good point. You can be right, --force-expire as that actual user took 641 secs. Have reenabled bayes_auto_expire now and will see. Manual --force-expire seems to have helped. I only get one timout per day since then - from what I see if multiple mails come in at the same time. What I still get and not understand is warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassa ssin/bayes_* R/W: lock failed: File exists Thanks for all your help so far. Regards, Jürgen
Re: SA gone mad, times out and stucks
Jürgen Herz wrote: What I still get and not understand is warn: bayes: cannot open bayes databases /var/spool/exim4/.spamassa ssin/bayes_* R/W: lock failed: File exists Make sure the file permissions hasn't changed when you ran the manual expire. Regards, Andreas
Re: TQMcube Geo Zone config files
Andreas Pettersson wrote: In case anybody is interrested, I've compiled a config file for the geo zone at TQM http://tqmcube.com/worldzone.php It might not be of great use, but it is interresting to gather some statistics of where the mails come from. Files found here http://anp.ath.cx/tqmcube/ I have updated tqmcube_world.cf with the -lastexternal setting on the set name, so that only the connecting IP address is checked instead of the whole chain of relays. Regards, Andreas
SpamAssassin MX Gateway Server
I have a unique but interesting problem: I have a farm of servers that use Sendmail/ProcMail/SpamAssassin. Due to their very heavy loads and my custom rules, I have built a dual-proc-dual-core FBSD AMD64 bit OS server to do nothing but my major spam knockdowns and processing to send back to the Sendmail/Procmail/SpamAssassin server farm. On my gateway MX server, I'm using Postfix/AmavisD and Spamassassin, and it works great. It's flat out rejecting spam scored over 150 spam score, it tags spam as spam if it's over 15 size, and it just puts in the spam headers over 15 size as well. If it scores UNDER 15, it neither get's scored nor given headers. Then, on the Sendmail farm, I use this recipe, which works great: :0: * ! ^X-Spam-Status: YES { :0fw * 256000 |/usr/local/bin/spamc -f } :0: * ^X-Spam-Status: Yes $HOME/mail/Caught-Spam Basically, anything that arrives over 15 in score, will have that SPAM-STATUS header embedded, so it does NOT run SpamAssassin on this server, and just puts it in the Caught-Spam. If it has LOWER than a score of 15 from the MX, then the MX server didn't put a header on it, so it's processed here and filed here. Why do that? Because my users on the sendmail server farm have a whole variety of score choices they are using, so I want their specfic score to be utilized - but by making the score on the MX 15, I'm saving the sendmail server from a WHOLE LOT of processing, and nobody's going to have a default score over 15... so that's a safe number? Make sense? This works great. The MX get's the mail, knocks down the really bad spam, tags the medium spam and let's the end servers re-score the questionable stuff to the user preferences. Ok - my question/problem is this: Is there a way I can run spamc (or spamassassin) so that it doesn't actually RESCORE/REPROCESS the mail (the large amount of work), but instead just looks at the users required score (required_score 6.0) and only re-tags the X-Spam-Status flag to YES or No?? See, in my current setup (as explained above): MX server scores it as spam score 205 -- sendmail farm nukes it MX server scores it as spam score 16 - MX tags it as spam -- sendmail farm just files it in the user's Caught-Spam folder. MX server scores it as score 7, which is below questionable as 15, so it doesn't score it --- sendmail then runs spamass on it, rescores it and then files it to user's settings.
Re: SpamAssassin MX Gateway Server
Fix to above post's last lines: MX server scores it as spam score 200 -- MX server just nukes it MX server scores it as spam score 16 - MX tags it as spam -- sendmail farm just files it in the user's \Caught-Spam folder. MX server scores it as score 7, which is below questionable which is set to 15, so it doesn't score it (nor gives it any spam headers) --- sendmail then runs spamass on it, rescores it and then files it to user's settings.
Re: Setting up DKIM and DomainKeys mail signing and verification
At 12:32 28-09-2006, Henrik Ostergaard wrote: This sounds promissing! But I have distributed, moving users and therefore uses pop-before-smtp for authentication, which means that my IP list is in a hash table, which is not in CIDR format. :-( dk-filter and dkim-filter support pop-before-smtp. Regards, -sm
Re: sa-learn and Caught spams
On Wed, 2006-09-27 at 21:00 -0400, Matt Kettler wrote: Bill Horne wrote: I have a follow on question, so I'll add it to this thread: Assuming that it's a good idea to feed Caught spams through sa-learn in order to reinforce the tokens that might not have been autolearned, how do I tell SA to ignore the SPAM notice in the subject? I have ignore-header commands in local.cf for the X-Spam-Status: Yes and other spam headers, but how do I skip only a portion of the subject? Provided it's a markup your SpamAssassin generated, SA will automatically ignore it when learning. Matt, Thanks for your reply: I apologize for not writing more clearly. I'm running Exim4 with Exiscan, so the SPAM prefix is being added to the Subject header by Exim4, not SA. The question is: can sa-learn be taught to ignore specific parts of the subject line, or do I need to filter its input with a separate process? TIA. Bill Horne
Re: SpamAssassin MX Gateway Server
I'm not the authority on such things, but I don't believe it's possible without some customization. I really wanted to ask you, though, how you handle mail rejection on the inner layer of mail servers? If mail gets through your front end SA box and needs to be rejected because it's to an invalid address or some other reason, how do you handle that? Jerry I have a unique but interesting problem: I have a farm of servers that use Sendmail/ProcMail/SpamAssassin. Due to their very heavy loads and my custom rules, I have built a dual-proc-dual-core FBSD AMD64 bit OS server to do nothing but my major spam knockdowns and processing to send back to the Sendmail/Procmail/SpamAssassin server farm. On my gateway MX server, I'm using Postfix/AmavisD and Spamassassin, and it works great. It's flat out rejecting spam scored over 150 spam score, it tags spam as spam if it's over 15 size, and it just puts in the spam headers over 15 size as well. If it scores UNDER 15, it neither get's scored nor given headers. Then, on the Sendmail farm, I use this recipe, which works great: :0: * ! ^X-Spam-Status: YES { :0fw * 256000 |/usr/local/bin/spamc -f } :0: * ^X-Spam-Status: Yes $HOME/mail/Caught-Spam Basically, anything that arrives over 15 in score, will have that SPAM-STATUS header embedded, so it does NOT run SpamAssassin on this server, and just puts it in the Caught-Spam. If it has LOWER than a score of 15 from the MX, then the MX server didn't put a header on it, so it's processed here and filed here. Why do that? Because my users on the sendmail server farm have a whole variety of score choices they are using, so I want their specfic score to be utilized - but by making the score on the MX 15, I'm saving the sendmail server from a WHOLE LOT of processing, and nobody's going to have a default score over 15... so that's a safe number? Make sense? This works great. The MX get's the mail, knocks down the really bad spam, tags the medium spam and let's the end servers re-score the questionable stuff to the user preferences. Ok - my question/problem is this: Is there a way I can run spamc (or spamassassin) so that it doesn't actually RESCORE/REPROCESS the mail (the large amount of work), but instead just looks at the users required score (required_score 6.0) and only re-tags the X-Spam-Status flag to YES or No?? See, in my current setup (as explained above): MX server scores it as spam score 205 -- sendmail farm nukes it MX server scores it as spam score 16 - MX tags it as spam -- sendmail farm just files it in the user's Caught-Spam folder. MX server scores it as score 7, which is below questionable as 15, so it doesn't score it --- sendmail then runs spamass on it, rescores it and then files it to user's settings.
Re: SpamAssassin MX Gateway Server
--As of September 30, 2006 12:32:41 PM -0500, Russ B. is alleged to have said: Basically, anything that arrives over 15 in score, will have that SPAM-STATUS header embedded, so it does NOT run SpamAssassin on this server, and just puts it in the Caught-Spam. If it has LOWER than a score of 15 from the MX, then the MX server didn't put a header on it, so it's processed here and filed here. Why do that? Because my users on the sendmail server farm have a whole variety of score choices they are using, so I want their specfic score to be utilized - but by making the score on the MX 15, I'm saving the sendmail server from a WHOLE LOT of processing, and nobody's going to have a default score over 15... so that's a safe number? --As for the rest, it is mine. Just as a thought: Since you are running procmail on them anyway, it should be possible to have a script in there that reads the desired score and uses the score count Spamassassin embeds in the 'X-Spam-Level:' header to filter. It wouldn't reformat the mail (at least not without a lot of work), but you could at least file it differently... Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: SpamAssassin MX Gateway Server
From: Daniel Staal [EMAIL PROTECTED] --As of September 30, 2006 12:32:41 PM -0500, Russ B. is alleged to have said: Basically, anything that arrives over 15 in score, will have that SPAM-STATUS header embedded, so it does NOT run SpamAssassin on this server, and just puts it in the Caught-Spam. If it has LOWER than a score of 15 from the MX, then the MX server didn't put a header on it, so it's processed here and filed here. Why do that? Because my users on the sendmail server farm have a whole variety of score choices they are using, so I want their specfic score to be utilized - but by making the score on the MX 15, I'm saving the sendmail server from a WHOLE LOT of processing, and nobody's going to have a default score over 15... so that's a safe number? --As for the rest, it is mine. Just as a thought: Since you are running procmail on them anyway, it should be possible to have a script in there that reads the desired score and uses the score count Spamassassin embeds in the 'X-Spam-Level:' header to filter. It wouldn't reformat the mail (at least not without a lot of work), but you could at least file it differently... If you can have per user rules and system wide Bayes it becomes real easy to have the per user rules be one line, their spam threshold. Of course, with per user Bayes you can have far better anti-spam because you are not dealing with one person's ham is another person's spam. But it gets to be a maintenance nightmare as the number of users goes up and the user computer sophistication goes down. {^_^}