Re: Are spammers finally feeling some pain? (update)
On Mon, 03 Jan 2005 11:06:22 -0800, [EMAIL PROTECTED] said: Over the past month I've seen a ~25% dropoff in the amount of spam we're receiving on a daily basis. Anyone else seeing a significant drop in spam recently? (Replying to self) The amount of spam we're receiving did go back up a little after the holidays, as some of you suggested it would. But it didn't go back up to pre-December levels. We are seeing a consistent decrease in the amount of spam addressed to recipients in our domain. Back in March, when we started rejecting high-scoring messages with a 550 at our internet gateway, we were getting 20 messages per week on average (over 90% spam), including rejected messages. After we started rejecting, over a period of about a month it rapidly dropped to 13, and for most of the summer it stayed around 11-13. Through October and early November is was steady at 11 messages per week. Then it started dropping again: Week ending total received messages Dec 5 104217 Dec 12103839 Dec 19103748 Dec 26 89378 Jan 2 80315 Jan 9 89175 I don't know whether the 550s are getting our users removed from spammer lists, or if there's some other factor. I was thinking maybe overall Internet spam was dropping, but from your replies last week, that doesn't seem to be the case. Anyway, I guess we're doing *something* right. :-) -- snowjack(a)fastmail.fm
Re: maintaining the 2.6 branch (was: [2.64] FORGED_MUA_OUTLOOK buggy)
Whoops, forgot to cc the list. Sorry for the dupe, Per. On Thu, 06 Jan 2005 09:54:32 +0100, Per Jessen [EMAIL PROTECTED] said: Ron Johnson wrote: Per Jessen wrote: Show of hands, who's still on 2.64 with no exact plans to upgrade? Alright, so far I've seen 4-5, maybe 6 people saying they intend to stick to 2.64 for the foreseeable future. Is that really all? I'm quite willing myself to put an effort in in maintaining 2.64, and I'll probably be doing it on a personal level anyway, but to work to produce actual releases for others, I think a bit more of an interest is needed. Me too. I'm a Debian user, so I'm sticking with 2.64 as long as it's working well. Unless 3.X goes into Sarge, which I suspect is unlikely. -- snowjack(a)fastmail.fm
Re: maintaining the 2.6 branch (was: [2.64] FORGED_MUA_OUTLOOK buggy)
On Thu, 6 Jan 2005 21:33:34 -0700, Bob Proulx [EMAIL PROTECTED] said: [EMAIL PROTECTED] wrote: Per Jessen wrote: who's still on 2.64 with no exact plans to upgrade? Me too. I'm a Debian user, so I'm sticking with 2.64 as long as it's working well. Unless 3.X goes into Sarge, which I suspect is unlikely. I am also a Debian user, running Debian woody stable, running the www.backports.org spamassassin-3.0.2 version and am very happy with it. Running Debian stable is not a good reason to avoid upgrading spamassassin to the best available version. Thus my conditional, as long as it's working well. 2.64 is working for me, and VERY well: ~99% spam hits. I see no reason to upgrade unless the spammers start getting around it somehow. What makes you say 3.0.2 is the best version? Will I suddenly get an accuracy boost to 99.999%? Running stable systems with unchanging versions of software is fine when you are behind firewalls and isolated from the changing internet. It is okay to run appliances there. But I would go so far as to claim that if you are interacting with the quite hostile Internet then you must keep the software that is doing the interacting up to date. You must keep on top of security vulnerabilities, yes. Asserting that new software == more secure software is a fallacy. Remember that security problems can be caused both by problems with the code, and problems with your configuration. If you keep up with the security patches, then changing your configuration all the time as the upstream source changes can only increase your chances of introducing a configuration error. Many times people are simply thinking security updates only. But when talking email it also includes virus checking filters and spam checking filters too. Your system may be stable but the Internet is not. Which is why good spam filtration and virus checking software gets dynamic information from pattern update servers, RBLs, SURBL, Razor, DCC, etc. etc. etc. In a nutshell: if it ain't broke, don't fix it. -- snowjack(a)fastmail.fm
Are spammers finally feeling some pain?
Over the past month I've seen a ~25% dropoff in the amount of spam we're receiving on a daily basis. Anyone else seeing a significant drop in spam recently? -- snowjack(a)fastmail.fm
Re: Any way to block really bad SPAMs?
On Mon, 03 Jan 2005 13:09:02 -0800, [EMAIL PROTECTED] said: On Mon, 3 Jan 2005 15:49:33 -0500, Gustafson, Tim [EMAIL PROTECTED] said: Hello I know that it's generally frowned upon to actually block SPAMs (as opposed to marking them as SPAM and letting the user decide) but my company has some instances where we get things that are blatantly, absolutely, unequivocally SPAM (think scores in excess of 100 points without BAYES or any white/blacklisting) and I wonder if there is a way I can configure SpamAssassin to actually block (as in, return a 550 SMTP error code) SPAMs that exceed some ludicrous SPAM score? By the way, we reject messages that score above 10 with a 550. We found that almost 95% of spam scores over 10, and almost zero ham scores above five. Messages scoring between 5 and 10 are accepted, tagged, and relayed to their recipient. We definitely don't frown on rejecting messages with high scores. I believe rejecting most spam with a 550 at the internet gateway has been responsible for the amount of spam addressed to our domain dropping by over 50% since we started rejecting in March 2004. We have not had a single complaint from our users, even about any false positives in the 5-10 range tagged by SA. I posted earlier today about a recent 25% drop over the past month. In January 2004 we averaged more than 200,000 messages a week, over 90% spam. Last week we got 80,000 messages. Some people suggested that it was due to the holidays, but it was a fairly steady decline starting in late November, so it may have been something else. I hope it doesn't ramp up again, but who knows, maybe it was the holidays after all. I'll post an update next week unless anyone objects. -- snowjack(a)fastmail.fm
RE: Any way to block really bad SPAMs?
On Mon, 3 Jan 2005 16:45:41 -0500, Gustafson, Tim [EMAIL PROTECTED] said: Thanks for all the help everyone. I guess the real question for me is how do I make spamass-milter block e-mails of a certain score, because that's how I integrate SpamAssassin into Sendmail. I don't use spamass-milter, so I can't vouch for how well it works, but a quick Google search revealed that using the -r command-line option to spamass-milter in the /etc/init.d script will set the rejection level. -- snowjack(a)fastmail.fm
Re: Unsubscribe?
William Holman wrote: I've been over-ruled by those who pay the bills, so I can't use SpamAssassin since it's open source How do I unsubscribe from the lists? Thank-you! I have an anti-spam product I, ah, we, ah, my COMPANY, (yeah, that's the ticket) call Snowjack Scanner which is at least as effective as any other... commercial solution. It has a Bayes filter, score averaging by sender, whitelisting capabilities, many effective individual rules which are weighted by genetic algorithm, and the ability to incorporate information from RBL and SURBL lookups. For only $5000, much less than many competing commercial solutions, I will send you this amazing package as soon as I can gimp a logo and slap it on a CD. I'll even set it up to auto-install on a bare-bones PC, with automated security updates, and for only $200 per hour I... uh, one of our CONSULTANTS... will help you integrate it into your mail systems. -- Snowjack Consulting Services Inc. LLC. TBS. OMFG.
Re: Unsubscribe?
snowjack wrote: I have an anti-spam product I, ah, we, ah, my COMPANY, (yeah, that's the ticket) call Snowjack Scanner which is at least as effective as any other... commercial solution. It has a Bayes filter, score averaging by sender, whitelisting capabilities, many effective individual rules which are weighted by genetic algorithm, and the ability to incorporate information from RBL and SURBL lookups. For only $5000, much less than many competing commercial solutions, I will send you this amazing package as soon as I can gimp a logo and slap it on a CD. I'll even set it up to auto-install on a bare-bones PC, with automated security updates, and for only $200 per hour I... uh, one of our CONSULTANTS... will help you integrate it into your mail systems. -- Snowjack Consulting Services Inc. LLC. TBS. OMFG. reply to self alertFor the humor-impaired, that was a joke. OMFG./alert /reply
Re: can any body help me understand this
Kang, Joseph S. wrote: As for the dump output.. 0.000 0108 1103190407 N:H*i:sk:NNfNNNc [snipped for brevity] The fourth is the token itself. SA uses some prefix characters for encoding things, but without any prefix, a token is a word in the body of the message. I think you meant the FIFTH column is the token itself, right? -Joe K. Agreed, I think the fourth field is the timestamp at which the token was last seen in a message, and the fifth field is the token. The timestamp's used for auto-expiry runs where tokens that haven't been seen in a while are removed from the db if other expiration requirements are met.
Re: sa-learn on a 15,000 email mbox file?
On Mon, 29 Nov 2004 12:01:02 -0900, Andy Firman [EMAIL PROTECTED] said: I just started using Spamassasin 3.0 and am very impressed with it. Recently, on an old server that I just started to manage, I just found a spam infested mbox spool file with 15,000 spams in it. (52MB) Nobody had checked the mailbox in about 10 months. Is it a good idea to run sa-learn on this giant spam mbox file on other servers that I get SA 3.0 installed on? Or no? Unless the address has never been used by a real person, you should manually check each message to see whether it's spam. Personally, I never have the endurance to check more than about 500 messages at a shot. So I'd just cut it into files of a size I could manually verify without bleeding from the eyes, delete any hammy-looking stuff I find in each file as I go through it, and then save the verified files and use those for bayes training. It would be safe to do what you propose if the account is one that you are certain will never receive legit mail, but old mail accounts *will* still get the occasional legit message. Hey Bob, why haven't I heard from you in the past eight months? Here's all our new customer info... For ongoing Bayes training, I have two IMAP folders that I copy messages into, one for ham and one for spam. Any spams scoring less than 10 get manually copied into the spam folder (the rest of the spam is rejected at the mail gateway). Periodically I run through a bunch of recent ham and copy it into the ham folder. A nightly script cleans out those IMAP folders, runs sa-learn on the messages, and copies them into ham/spam folders on the server, so I can use those if I need a corpus of manually verified messages. -- snowjack(a)fastmail.fm
Re: feeding spam messages for training
Richard Harding wrote: I am looking at getting messages together to train spamassassin and told users to forward me messages that are spam that still get through. Is this an ok method of collecting or will the fact that so many are forwarded messages throw off the training? In short, yes, it will not work as well as if you trained using the original messages, because forwarding a message usually blows away all the header goodness and replaces with new headers. But there are ways. This is a FAQ: http://wiki.apache.org/spamassassin/ResendingMailWithHeaders http://wiki.apache.org/spamassassin/SiteWideBayesFeedback
Re: Rules List
On Tue, 9 Nov 2004 11:53:13 -0800, Greg Earle [EMAIL PROTECTED] said: SNIP Ergo, are there 2.63-friendly .cf files out there with SURBL/SpamCopURI functionality in them? Here's my surbl.cf. Edit the scores at the bottom to your taste. # checks to do network lookups of URLs found within spam messages # using the most excellent DNS database at surbl.org uri SPAMCOP_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+2') describe SPAMCOP_URI_RBL URI's domain appears in spamcop database at sc.surbl.org tflagsSPAMCOP_URI_RBL net uri WS_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+4') describe WS_URI_RBL URI's domain appears in ws.surbl.org tflagsWS_URI_RBL net uri PH_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+8') describe PH_URI_RBL URI's domain appears in ph.surbl.org tflagsPH_URI_RBL net uri OB_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+16') describe OB_URI_RBL URI's domain appears in ob.surbl.org tflagsOB_URI_RBL net uri AB_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+32') describe AB_URI_RBL URI's domain appears in ab.surbl.org tflagsAB_URI_RBL net uri JP_URI_RBL eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+64') describe JP_URI_RBL URI's domain appears in jp.surbl.org tflagsJP_URI_RBL net score SPAMCOP_URI_RBL2.4 score WS_URI_RBL 2.0 score PH_URI_RBL 2.4 score OB_URI_RBL 2.0 score AB_URI_RBL 2.4 score JP_URI_RBL 2.4 -- snowjack(a)fastmail.fm
Re: Rules List
On Tue, 09 Nov 2004 15:23:08 -0500, Kris Deugau [EMAIL PROTECTED] said: SNIP Snag mine from http://www.deepnet.cx/~kdeugau/spamtools/ Nice meta rules (BAYES_vs_SURBL) -- I like those a lot!!! -- snowjack(a)fastmail.fm
Re: Frustration...
On Fri, 05 Nov 2004 13:31:27 +0100, Kai Schaetzl [EMAIL PROTECTED] said: Doing this after a spamassassin scan is useless, nevertheless. See, my other reply of today. People with high accuracy requirements would disagree with you. For some people, including me, false positive rates for straight RBL rejection are unacceptable. I simply can't use straight RBL rejection. Not an option. Rejecting the mail after the spamassassin scan is MUCH more accurate than rejecting based on RBLs alone. And it's definitely a lot better than letting all the spam through. Sure, you don't get any bandwidth advantage, but when a false positive could cost you $thousands, the bandwidth is a lot less important than the accuracy. So, it's not useless. -- snowjack(a)fastmail.fm
Re: Frustration...
On Fri, 05 Nov 2004 22:31:26 +0100, Kai Schaetzl [EMAIL PROTECTED] said: wrote on Fri, 05 Nov 2004 10:09:53 -0800: People with high accuracy requirements would disagree with you. For some people, including me, false positive rates for straight RBL rejection are unacceptable. I simply can't use straight RBL rejection. Not an option. You didn't get my point. It's useless if not harmful to bounce a message after you already got it in full. Full stop. You can reject with 5XX without bouncing even after you receive the message in full. I'm not talking about bouncing. Rejecting! Rejecting! For me, the cost of a false positive FAR exceeds what the extra bandwidth costs me to download the body of the message before making the decision of whether to accept or reject the message. I agree that combining the both mathematically should be more accurate, but it needs a *multitude* of system ressources and traffic in exchange for an accuracy increase which is almost not measurable. No good deal in my eyes. The accuracy increase is very measurable. If a false positive isn't that big a deal for you, fine. All I'm saying is that for some people, it really is that important, and for us the accuracy increase is worth far more than the cost of bandwidth and processing the extra data. Anyway, if you like to go this way, fine, but that doesn't change the fact that bouncing stuff you already have taken in full is bad. It's *no* advantage to you but maybe a burden to others. I don't see any rectification for that. 1) It's a *huge* advantage for me. Our e-mail filtering accuracy is very important to us, and using SA to scan the message body makes a big difference. 2) I don't see how the difference between sending a 5XX after receiving the message body is more of a burden on others than sending a 5XX before receiving the message body. Please enlighten me. I'm paying for the extra bandwidth that's used. My ISP pays their upstream for the extra bandwidth used, and so on. It's just extra business for them, I'm sure they're happy to see it. -- snowjack(a)fastmail.fm
Re: Frustration...
On Thu, 04 Nov 2004 17:39:43 -0500, Rick Macdougall [EMAIL PROTECTED] said: If you don't bounce, what do you do ? /dev/nulling the message is not a real option since mail should never just vanish, and in the case of false positives, the sender would never get the rejection message. Some definitions relating to MTA behavior: Bounce: Your MTA accepts the message, then generates a Delivery Status Notification message (aka DSN, aka bounce message) explaining why the message was not delivered, and sends it to the sender address of the undelivered message, which in the case of spam is almost certainly not the real sender in any case Reject: Your MTA does not accept the message, sending a 5XX to the sending MTA, and generates no DSN. -- snowjack(a)fastmail.fm
Re: ver 3.0 opinions
On Thu, 28 Oct 2004 16:19:13 -0700, Bart Schaefer [EMAIL PROTECTED] said: On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey [EMAIL PROTECTED] wrote: Is version 3 really any better at stopping spam that 2.63? Version 3 stops different spam than 2.63, in my experience so far. E.g. it's better at catching the drug spam but not as good at the earn cash for making phone calls spam. I would say that the *default* config of 3.X is significantly more effective than the default config of 2.63 or 2.64. But I think after some tweaking of 2.64 it's probably just as effective as 3.0, once you've added in the SpamCopURI patch, antidrug.cf, some of the other SARE custom rulesets. Both of them depend to a large extent on how much care you put into setting it up, training BAYES properly, etc. Using it in local only mode, though, I've found it not very different. The spams that get through 3.x that do not get through 2.6x are generally (a) those that match BAYES_99, which by itself in the default configuration is no longer a large enough score to make me happy, or (b) would have been tagged as spam except that the AWL smoothed them down to just below the threshhold. Yup. But I wouldn't turn off AWL if I were you. I think it's a very nice feature, and has probably prevented a few false positives for me. Yes, occasionally it will pull the score the wrong way across the threshold, but if it's doing that, you're better off figuring out why this person's messages get an *average* score on the wrong side of your threshold anyway. If you get that fixed, then the AWL will stop pulling messages the wrong way across your threshold. I do customize my BAYES scores because I'm not very happy with the defaults. I find that a significant portion of spam manages to reduce its Bayes probability to 40-60% by including large chunks of innocent text at the end of the message. So I add the below to my lines to local.cf, because most of my ham scores below 10%, while the messages that hit Bayes' 40-60% range are more than 95% spam. This still catches those spams which hit BAYES_99, and for that rare ham that hits BAYES_99 (I've never seen one, but I suppose they must be out there), the AWL or another whitelist rule will hopefully pull it back under five points. score BAYES_00 -4.9 score BAYES_01 -2.0 score BAYES_10 -1.5 score BAYES_20 -1.0 score BAYES_30 -0.5 score BAYES_40 0.1 score BAYES_44 0.7 score BAYES_50 1.0 score BAYES_56 1.5 score BAYES_60 2.1 score BAYES_70 3.1 score BAYES_80 4.2 score BAYES_90 4.9 score BAYES_99 5.4 -- snowjack(a)fastmail.fm
Re: [sa-list] Re: slightly OT: sudden rise in Rumplestiltskin attacks?
On Wed, 27 Oct 2004 15:05:44 -0400 (EDT), Dan Mahoney, System Admin [EMAIL PROTECTED] said: On Tue, 26 Oct 2004, Jeff Chan wrote: Pedantic nit-pick of the day: I'm sure you meant reject instead of bounce, right? Bounce to some means reject. Bounce to others means forward. The distinction between forwarding and bouncing for me is the same as in pine -- bounce means to simply re-email the mail to the final destination, whereas forward means to encapsulate the original email as an attachment, or enclose the email inside a new email with a (fwd) subject line, and additional headers. -Dan Well, to some, Bounce means to hit a hard surface and then spring back the other way. But I don't think there was any ambiguity in the original discussion, because we were talking about MTA behavior. If you want to talk about basketball or mail clients or Majordomo or forwarding, then the term 'bounce' might have another meaning. In MTA context, a bounce is well defined: http://www.wordiq.com/definition/Bounce_message I'm picky about this one because it can make a big difference for joe-job victims. If your server bounces messages addressed to invalid users (which was suggested by the person I originally replied to), then you're creating a problem by generating DSN messages which flood joe-job victims. If your server's rejecting the messages, then the remote MTA software might still create a DSN, but hopefully most ratware running through a proxy doesn't bother to try notifying the bogus sender address that the spam wasn't delivered. It's easy to set up a mail system that bounces instead of rejects messages with invalid addressees, especially if you have SpamAssassin installed on a system which relays the mail to another server for final delivery. Not an uncommon setup, especially on this list, and there are people who aren't aware of the distinction. Apologies for the pedantry, hopefully someone will find this educational and configure their gateway MTA to reject mail addressed to invalid users, instead of trying to relay to the final delivery server and then bouncing when it is refused. -- snowjack(a)fastmail.fm
Re: [OFFTOPIC] Opinions on DSPAM
On Mon, 18 Oct 2004 14:32:10 -0400, Mathieu Nantel said: As I've read a few articles on DSPAM claiming that it's better/faster/sexier than spamassassin, I would appreciate having this list's comment on DSPAM. I'm sure quite a few of you have tried it and might have some interesting experiences to share. My understanding is that DSPAM relies solely on algorithms (Bayes, CHI^2), and that complications arise when you have to teach your users on how to train the system (which SA doesn't require as it's based on other things aside from Bayes). Hi Mathieu, I haven't done any carefully controlled studies, but I've been very successful with SpamAssassin. I think SA has the better approach. Spammers have gotten quite good at fooling the pure algorithm methods. Some get their messages' spam probabilities down to 50% or so if not lower, mainly by including a lot of innocuous text in their messages. If you can also use DNS-based databases to look up IP addresses and domains associated with known spammers, plus other rules such as known ratware patterns in headers, keywords like 'viagra', and use that information in combination with the Bayes and other algorithms to determine messages' spamminess, you will have better accuracy. Blacklisting domains that are included in the content of spam messages has been a very successful technique for us. Unfortunately, SA is a real memory hog and has higher hardware requirements than DSPAM to handle the same message load. But that's not a big issue for us. We average about 25,000 messages per day from the Internet with ~400 users. SpamAssassin is running on a dedicated Athlon 1.5 GHz machine with about 750MB of RAM, and we haven't had any problems. Peak RAM usage is about 600MB. -- snowjack(a)fastmail.fm
Re: Antidrug.cf
How completely rude. What are you, twelve years old? jdow wrote: It seems anabolic steroids are flat out missed by antidrug.cf. Of course, I observe the idiot Apache spam trap on the spamassassin list does catch the message sample when I attach it. Somebody needs to apply a clue bat to the Apache mail manager to get it to have this and the dev lists bypass his antispam bazoola. What's the mail manager got, spit for brains? {^_^}
Re: AWL auto_expire?
Nate Schindler wrote: Just a curiosity question for now - is auto-expiring the AWL a planned feature? My auto-whitelist is about 3x the size of bayes_toks. I imagine it'll become problematic eventually, since it's only growing. ...or is there already some way to expire old entries from the AWL, and i'm just a 'tard? or both? I use this successfully with SA 2.64. I run it automatically once per month. (Thanks, Kris!) http://www.deepnet.cx/~kdeugau/spamtools/trim_whitelist
Re: score changes in local.cf not recognized.
[EMAIL PROTECTED] wrote: I am running Postfix 2.1 with a content_filter (latest amavisd-new) which sends all mail through SA 2.64. I understand *some* variables that are defined explicitly in amavisd-new are special, and thus have no effect when defined (differently) in local.cf. AFAIK, scores are not included in this restriction. I will ask on the amavis list, but in case I'm experiencing another pitfall or screwing up the syntax, here goes: I set score RCVD_IN_BL_SPAMCOP_NET 5.000 in local.cf and noticed incoming spam still tags such emails with a score of 2.2. Is there anything else I should check before assuming this is an external, non-SA issue? ...Did you restart amavisd-new after making the change?
Re: score changes in local.cf not recognized.
[EMAIL PROTECTED] wrote: Quoting snowjack [EMAIL PROTECTED]: ...Did you restart amavisd-new after making the change? Yes. :-) Is there any evidence that local.cf is getting read at all?
Re: Memory usage spikes ...
On Mon, 04 Oct 2004 10:29:38 -0700, Justin Mason [EMAIL PROTECTED] said: Please note that pretty much *all* our documentation notes that this is the case. You should NOT scan very large messages. We configure our spamd client to only pass spamd up to the first 50KB of a message. Definitely helps keep memory usage under control, and doesn't seem to hurt effectiveness at all. -- [EMAIL PROTECTED]
Re: SA 3.0 is eating up all my memory!!!
Loren Wilton wrote: 80M doesn't strike me as unusual for spamd if you have any of the addon rulesets. [EMAIL PROTECTED]@#sputter...! Yes, that is too unusual unless you're using ALL the addon rulesets, including BigEvil, which, I hear, eats pets and small children when nobody's looking, and should be avoided. And also probably several non-SARE rulesets too.
Re: 2.6 - 3.0 migration questions
Kelson wrote: How about ROSS: Real Open Source Software? Bitchin' Open Source Software: BOSS :-)
Re: Preferred DNSBL
On Tue, 28 Sep 2004 08:57:28 -0500, Bob Apthorpe [EMAIL PROTECTED] said: Hi, Hello. On Mon, 27 Sep 2004 15:10:30 -0700 [EMAIL PROTECTED] wrote: On Mon, 27 Sep 2004 12:52:41 -0400 (EDT), Dan Mahoney, System Admin [EMAIL PROTECTED] said: Hey guys, as a quick survey, if you're blocking ips at the MTA level, which are you using? I think it's a bad idea and don't do it at all. Much better to configure your MTA to reject mail based on a SpamAssassin score which nicely combines the RBLs and other spam indicators. Our MTA returns a 550 after the DATA is received on any message that SpamAssassin scores higher than 10, which blocks about 90% of all spam we get (that's about 70% of all incoming mail, lately). I'll counter that rejecting before DATA saves on bandwidth and CPU, and can be done safely with a judicious choice of DNSBLs. I like your choice of RBL's, but your definition of 'safely' doesn't match up with what my users consider an acceptable number of false positives. -- [EMAIL PROTECTED]
Re: Bayes keeps forgetting learned messages
On Tue, 28 Sep 2004 12:46:17 -0700, Erik Wickstrom [EMAIL PROTECTED] said: Hi all, 2 problems. First, when I train SA on ham or spam, it seems to forget the counterpart. Example: sa-learn --mbox --showdots --ham inbox Would add say 300 hams to the Bayes DB, but turns the spam count to 0 or a very small number and vice versa (sa-learn --dump magic) You're not somehow accidentally using the same messages with --ham as with --spam, are you? If SA has learned a message with --ham, then you feed it the same message with --spam, it will un-learn the --ham tokens it got from the first 'learning experience'... -- [EMAIL PROTECTED]
Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?
Juhapekka Tolvanen wrote: Myth 4: PERL is designed for language processing, so SpamAssassin is written in a more appropriate language. Let me preface this with the fact that I've had about 10 years of experience coding PERL. While PERL is very useful for language processing and web applications, it is also an extremely slow, interpreted language. Process startup is slow. Perl is pretty efficient once the process is running, and a well-set-up SpamAssassin 3 configuration will already have the processes started before a spam is even received. The average overhead for a single PERL process is around 2MB of RAM. Yeah, and it is true that SpamAssassin uses lots of RAM (20M per process?) So what, RAM is cheap! I really don't care about attitudes of author of DSPAM. I just want to know, how much faster SpamAssassin will be, if its Bayesian engine is replaced with something else, for example with DSPAM. It does not hurt, if we try it out and see what happens. And it does not hurt, if people have more alternatives. I had a little single-processor 1GHz Athlon machine with 256MB RAM using SpamAssassin to scan about 30,000 e-mails per day for a while. That was pushing the RAM usage a little, but it worked fine. I've since upgraded to about 750MB RAM just to be safe, and our load has dropped to about 25,000 mails per day since I started rejecting (550) the high-scoring messages. The DSPAM authors are making it sound like SpamAssassin is more of a performance problem than it really is. If you want to know, what kind of computer I used, here are its specs: http://iki.fi/juhtolv/eng/tietokone.eng.html Your biggest problem on this computer is only having 64M RAM and having all kinds of other software (Gnome? Enlightenment? Those will use a lot of your 64M all by themselves!) running when you're trying to load SpamAssassin. Your problem is that you need more RAM, not that there's something wrong with SpamAssassin! Yes, DSPAM will possibly use quite a bit less RAM, so it might be a decent choice for you. But I doubt that it's really as effective as SpamAssassin. BTW Creating SA-plugin that runs crm114 may be good thing to try out, too. And I don't mind, if some people create bogofilter- and SpamProbe-plugins for SA. Just do it, if you feel so. But DSPAM seems more interesting for me. I haven't been able to try it out, because it is not yet available as Debian-package and I haven't yet bothered to compile it myself. SpamAssassin is packaged in Debian already, but version 3.0 is not yet available as Debian-package. I reiterate: It does not hurt, if we try out and see what happens. Having SpamAssassin call some other program like DSPAM will make your performance much worse, because you will already have SA loaded, taking up a chunk of RAM, and then it is trying to load another program, which will use even _more_ RAM. Your options are: 1) buy more RAM -or- 2) quit using Gnome, Enlightenment, and SpamAssassin on that box, find a nice thin window manager (IceWM?) and use some low-memory-friendly spam scanner. There are mail clients out there that have Bayes filters built in. Bogofilter may also be an option.
Re: Auto White List
Rick Macdougall wrote: Yup, I understand how the whole AWL works but my problem is that border line spam is being dropped to ham. Example: A normal markup of 5.6 and an AWL score of -0.8 drops it below the average user required_hits of 5 and does not get marked as spam. Right, but it's an averaging system, so if you're seeing negative AWL scores, that means that future mails from the same sender will be averaged higher, eventually auto-blacklisting that address. I think a longer example would make it clearer. Let's say the first message from that sender scored 4.8, and the second and third messages scored 5.6, and so on. Let's see what happens: Message 1: 4.8 (first message from this sender IP, no AWL hit) Message 2: 5.6 AWL:-.8 final score: 4.8 Message 3: 5.6 AWL:-.4 final score: 5.2 Message 4: 5.6 AWL:-.27final score: 5.33 Message 5: 9.8 AWL:-4.6final score: 5.4 Message 6: 3.5 AWL:+4.35 final score: 7.85 Message 7: 4.8 AWL:+2.18 final score: 6.98 Message 8: 3.5 AWL:+2.17 final score: 5.67 Message 9: 15.5 AWL:-10.1 final score: 5.4 Message 10: 3.5 AWL:+3.02 final score: 6.52 Note how every time you see a large negative number in the AWL score, the very next message has a significantly higher final score. The AWL always sets the message's final score to exactly match the average score of all the messages received from that sender in the past. Note how message #8 originally scored 3.5, but AWL gives it +2.17, while message #10 (also originally 3.5) gets an AWL adjustment of +3.02 because of the high score message #9 received. This is great for reducing false positives from a person you correspond with often, who sends you something spammy-looking once in a while. Also note how the resulting final scores are much less erratic than they would be without AWL. In this example, we would have half the messages scoring below the spam threshold without AWL, but with AWL enabled, only the first one gets through. SA developers: feel free to add this example to the wiki if you think it would be helpful.
Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?
David Brodbeck wrote: On Wed, 22 Sep 2004 17:26:12 -0700, snowjack wrote Yeah, and it is true that SpamAssassin uses lots of RAM (20M per process?) So what, RAM is cheap! If I'm not mistaken, some of that 20M is actually shared amongst all the spamd processes, so it's not as much memory usage as you'd think. Five spamd processes that each claim to be using 20M may not actually be consuming a total of 100M. *nix is tricky that way. ;) Hmm # top PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 31162 spamdata 9 0 25108 17M 6680 S17.5 2.3 0:00 spamd 31163 spamdata 14 0 24596 16M 7284 S 7.7 2.2 0:00 spamd 25108 - 6680 = 18428 KB physical RAM usage 24596 - 7284 = 17312 KB physical RAM usage Am I missing something?
Re: What the Hell? Fw: Mail delivery failed: returning message to sender
jdow wrote: My understanding is that Earthlink servers are open so that people who are mobile can still send mail through their Earthlink accounts. The way they handle the spam issue is a tarpit operation. The more mails you send in a given interval the slower the mail processes. So Earthlink mailers can be used for spam - in very small quantities. And you're blaming DSBL? If what you say is true, do you suppose Earthlink has ever heard of SMTP-AUTH for their mobile users? I wonder how Earthlink would prevent a spammer from spoofing his/her IP address to a different address for each message. I have a hard time believing that Earthlink doesn't use SMTP-AUTH. It's more likely that some legitimate (if somewhat clueless) Earthlink customer accidentally used DSBL software to list their own SMTP relay.
Re: Speak to me of Bayes and scoring in SA 3.0
On 16 Sep 2004 13:39:30 -0700, Daniel Quinlan [EMAIL PROTECTED] said: Bart Schaefer [EMAIL PROTECTED] writes: Feeding the Bayes rules through the scoring algorithm seems to imply a lack of trust in the accuracy of the classifier. Mostly not. It's needed to map from the 0 to 1.0 probability to the SpamAssassin threshold-based scoring method. Even in more pure Bayesian systems, users still have to figure out where to put stuff into the spam bucket and it's often not at 0.50. Our technique avoids the problem of people having two different calibrations. Plus, there's the lack of trust thing, but that's a lesser factor. I think we could use a better way to merge Bayesian results into the SpamAssassin score, though. I thought so too... I added the following to my local.cf based on Bayes scores of spam we receive. Spammers are really trying hard to make their spams look hammy, but regular users are (hopefully) not trying to make their hams look spammy. So I weighted the scores in that direction since my Bayes engine seems much more likely to give my ham a very low score than to give my spam a very high score. Spammers can fairly easily get their Bayes scores down to about 50% probability, but it's much more difficult to get them down below 40% probability since they would have to know your particular organization's 'hammy' tokens (which would not remain hammy for long if you're training regularly). score BAYES_00 -4.9 score BAYES_01 -2.1 score BAYES_10 -1.5 score BAYES_20 -1.0 score BAYES_30 -0.5 score BAYES_40 0.1 score BAYES_44 0.7 score BAYES_50 1.0 score BAYES_56 1.5 score BAYES_60 2.1 score BAYES_70 3.1 score BAYES_80 4.2 score BAYES_90 4.9 score BAYES_99 5.4 -- [EMAIL PROTECTED]
RE: Subject line
On Tue, 14 Sep 2004 14:14:17 -0700, Bret Miller [EMAIL PROTECTED] said: Maybe it wouldn't make *your* life easier. But because it's visual, it allows me to more easily discern relevance when I put more than one list together in a mailbox. A certain subject in one list would be more relevant to me than the same subject in another. First of all, not only does it NOT make my life easier, I'm actively annoyed by that 'feature' especially on lists with [StupidlyLongIdentifiers]. Even the shorter names are just visual clutter, they're the same on every message in a given thread. So, on behalf of the mostly-silent group of people who don't care enough about such trivialities to add to this already overlong thread, I want to thank the list admins for keeping Subject line clutter to a minimum. I use Outlook so I don't have a lot of options for sorting like some other apps do. The question was asked; I answered it. I don't use OutLook but I'm pretty sure I've helped one of my users define a rule to filter messages based on a header that wasn't pre-defined in OutLook... -- [EMAIL PROTECTED]