I don't understand the Bayes scoring logic
Hi folks can anyone explain the logic behind this. Various spam gets tagged with the Bayes check but as follows * 0.4 BAYES_60 BODY: Bayesian spam probability is 60 to 80% * [score: 0.6343] * 2.1 BAYES_80 BODY: Bayesian spam probability is 80 to 95% * [score: 0.8695] * 1.9 BAYES_99 BODY: Bayesian spam probability is 99 to 100% * [score: 1.] So, a 60-80% probability scores 0.4 an 80-95% probability scores 2.1 and a 99-100% scores 1.3 Why does a 99-100% probability score less than an 80-95% probability??? Answers on a postcard please to.. Cheers Nigel pgpUlLWegZG2N.pgp Description: PGP signature
Re: mailx vs pine local mail scan times
Rob Fantini wrote: Is there a way to disable spamassassin from processing mail sent to our local network from our local network? How are you calling spamassassin? Are you calling it through procmail? If so then you can use procmail to avoid calling spamassasin in those cases. The easiest thing would be to avoid processing through spamassassin if the from address were on your network. :0fw * !^From: .*@([^.]+\.)?example.com | spamassassin This runs the risk that you will see spam from your forged addresses. Those are called joe-jobs. But if that does not bother you too much then this works. To improve the accuracy you need to avoid whitelists. If you had a mail filter program that checked that the message originated on your network then you could use that as part of the procmail check instead of just the From: address. I don't happen to have one handy to post. But if someone else did I would be interested in something like that myself. Bob
Re: I don't understand the Bayes scoring logic
Nigel Wilkinson wrote: Why does a 99-100% probability score less than an 80-95% probability??? Because the Bayes engine is not the only factor in classifying a message as spam. Along with that all of the other rules are factored into it too. A message which is 99-100% probability is going to trigger many of the other SA rules. The total is enough to push the message over the 5 point threshold. The scoring program therefore did not need to make the BAYES_99 score any higher than it did. And I also believe there is a value in the SA development team that no single rule should be too large. It can lead to false positives. It is better to be conservative and avoid false positives for the masses. However, *I* don't like seeing the same spam again and again. With the default values I would see a spam, train for it, and still see the same spam again and again because it would only score BAYES_99 and be below the threshold. Often this is before it is reported and before network tests such as RBLs and SURBL can tag the sender. So I increase the BAYES_95 and BAYES_99 points to 4.0 and 5.0 for my own personal use. That way if the same spam comes through again, as I know it will, it will get tagged. But I can't say with any authority that this won't generate false positives. I can only say that I have only myself to blame in that case and also that since I know what it is doing I won't be surprised by it. Bob signature.asc Description: Digital signature
Re: Quinlan interviewed about SA
On Saturday, March 5, 2005, 11:24:25 AM, Eric Hall wrote: On 3/4/2005 1:57 PM, Rob McEwen (PowerView Systems) wrote: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. I kind of disagree with this, but only partly. Generally speaking, you want as many good indicators as you have bad indicators. If you have hundreds of indicators that flag every possible spam-sign, then sooner or later every piece of good mail will also get flagged by one rule or another. In order to off-set this, you want to have a collection of good indicators, so that you can cancel out the everything-looks-like-spam effect. Unfortunately, these rules will also hit some kind of spam, so sooner or later a large enough set of good rules will just make everything some shade of grey, or worse will make marginal spam appear to be good. Now then, in order to avoid that, you really should limit the positive indicators to stuff that you can verify (which is only slightly different than authenticate). All the rules are verified by testing against spam and ham corpora before being deployed. Ones that have high false positives are given a low score or not used at all. Folks don't just make up rules and deploy them. The usefulness of the official rules is checked before they're released. YMMV on homemade rules. That said, as the Internet moves towards more useable identification and authentication schemes for mail, they will probably get positive rules in SA. SPF or Domain Keys may (or may not) be examples, but the nice thing is that SA lets us give them relative goodness scores and not an outright pass or fail, so they don't need to be perfect out of the box. That may actually help their adoption as it arguably has with SURBLs. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: mailx vs pine local mail scan times
Bob Proulx wrote: How are you calling spamassassin? Are you calling it through procmail? Yes If so then you can use procmail to avoid calling spamassasin in those cases. The easiest thing would be to avoid processing through spamassassin if the from address were on your network. :0fw * !^From: .*@([^.]+\.)?example.com | spamassassin Thank you, that is just what I was looking for. To improve the accuracy you need to avoid whitelists. Should I avoid whitelists them altogether, or just for local networks checking?
Re: [SPAM-TAG] SURBL missing this spam
On Sat, Mar 05, 2005 at 11:07:22AM +0100, Raymond Dijkxhoorn wrote: Any ETA on 3.1 ? Nothing official. We're planning a bug fix fest (or whatever you want to call it) later this coming week, and we'll have to figure out what is left for 3.1 versus what can get punted to 3.2. There's also the whole score generation thing as well as a week or so of 3.1 release candidates. So I'd say a minimum of 1 month if we go gung ho for the next week or two and get it all together. I, and several other people, have been dogfooding the 3.1 code for a while though, and it's pretty stable already. FWIW. -- Randomly Generated Tagline: Disappearing Tagline! (Just hit Enter. Try it now!) pgpwCfTh9xvDg.pgp Description: PGP signature
Re: I don't understand the Bayes scoring logic
At 07:08 PM 3/5/2005, Nigel Wilkinson wrote: Why does a 99-100% probability score less than an 80-95% probability??? This is more-or-less a FAQ in SA now. Rule scores in SA are not in any way linear. The scores are not assigned based on performance, they're based on tuning the scores of ALL of the rules together in such a way to minimize the total of FP's and FN's with a 1:100 ratio (i.e. find the lowest FP +100*FN). Because of this, rule scores are not assigned based on the performance of one individual rule, but it's interactions with every other rule in the ruleset. In the case of BAYES_99, it would appear that most spam messages that hit it also hit a lot of other rules, thus SA's score optimize could sacrifice the score slightly to reduce the FPs without introducing a significant number of FN's. However, the story may be different in BAYES_80.. here the spams are likely to be more evasive, and might need a higher score from this rule to avoid large numbers of FNs. The other off-chance possibility is there may be some mis-placed spams in the corpus the dev's used. Actualy, there's almost certainly one or two in the lot, but if there's a decent number of them they can really screw up the scores.
Re: scores too low - neural network problem?
What is the output of this on your mesages? spamassassin -tD 21 | pager What value does it show for BAYES_99 in the content analysis section? If it says something other than 4.07 then it confirms that you are not running with values from column four network test off. It sounds instead like you are running with network tests enables. Are network tests enabled in the debugging output? Thank you, this was correct. I thought I had disabled the network tests, but I hadn't. I've disabled them now, and the scoring has returned to what I thought it should be. Regards, Andrew.
Re: scores too low - neural network problem?
I understand that the individual test scores are fed through a neural network to derive the final score. So it seems that this network has started to behave badly. You misunderstand. The neural network (or whatever they're using these days - it at least used to be a genetic algorithm) is used to assign the default scores, not to adjust the scores after the fact. Thank you, you're right. I had misunderstood that. More likely one of two things is happening: that header was added by another system running SpamAssassin, or you aren't running with the configuration you think you are. You're right-- I thought I had disabled the network tests, but I hadn't, so I wasn't getting the scores I thought I was. I disabled the network tests, and the problem is solved now. Regards, Andrew.
Re: mailx vs pine local mail scan times
Rob Fantini wrote: Bob Proulx wrote: To improve the accuracy you need to avoid whitelists. Should I avoid whitelists them altogether, or just for local networks checking? The real problem is forgeries and spoofs. Anyone can put any from address they want on a mail message. Viruses especially do this routinely. Any whitelist based only on the From: address will be fooled by these. You whitelist your network and those will pass right through the checks. If you can ensure that mail on your network is not forged then whitelists for your network will be fine. But if not, then some viruses will undoubted forge your address and fool your whitelists. On my network I try hard to make sure that spoofed mail address from my own domain cannot enter my domain. But it is hard. I really can't do it. For example this message to the mailing list leaves my network, goes to the mailing list, then comes back into my network. The message contains my From: address. Any whitelist I would have on my domain would be fooled if that were spoofed. Because of this problem I don't like any algorithm that by design trusts the user. Who goes there, friend or foo? Friend! Well, okay fine, you may pass. Therefore I don't like simple From: name whitelists. They have that fundamental flaw. I always try to avoid them. So then you ask what is the alternative? In spamassassin it follows the chain of hosts through the trusted_networks variable backtracking through the Received: headers. When it finds the point that mail enterred your network it can use that foreign machine's IP address and perform network checks. If the mail never left the network it sets ALL_TRUSTED which is good for negative points pushing the message to the non-spam classification. It would be great to have that capability available as a standalone script outside of the full spamassassin check. It was a check like that I was suggesting to really know if the mail came from your network. But as far as I know it is not available outside of spamassassin at this time. If someone had the inclination they could write that check in a standalone form. Bob
Re: Quinlan interviewed about SA
On 3/5/2005 9:00 PM, Jeff Chan wrote: On Saturday, March 5, 2005, 11:24:25 AM, Eric Hall wrote: On 3/4/2005 1:57 PM, Rob McEwen (PowerView Systems) wrote: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. Ones that have high false positives are given a low score or not used at all. Folks don't just make up rules and deploy them. The usefulness of the official rules is checked before they're released. Yes, but we don't have very many of them. I don't mean validate by passing it through pre-release testing either (although that's certainly important), but instead mean that the message itself has to contain enough data for the marker to be validated. Whether this is an external agent that will validate some hash (as in the probable case of DK), or something in the message itself (a trusted relay says that a cert is good), or whatever, the important thing is the verification part (this is still different from authentication). nice thing is that SA lets us give them relative goodness scores and not an outright pass or fail, so they don't need to be perfect out of the box. Yes, my point being that rather than saying they are not useful we really ought to be working hard on finding ways to add more of them, because it is their volume that makes them useful (otoh, having too many of them, such that the bar is lowered, is indeed bad). -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
basic set of L_RCVD tests
Here's a starter set of strict-SMTP rules, using data from the Received headers as sucked into the X-Spam-Relays-Untrusted psuedo-header. There are tests for unqualified hostname in HELO, domain literal in HELO (SA tests for addresses, but not literals), lack of reverse DNS, mismatched HELO and RDNS, too many hops, not enough hops, and some more stuff that is common spam-sign. A couple of these tests can hit very often against legitimate mail (in particular, there are *A LOT* of SMTP clients that have mismatched HELO/RDNS--including this mailing list's server...) so they all have a default score of 0.1 for safety. OTOH, some of these rules also hit very frequently against spam. I need to monitor them for a while, do some tweaks (see the notes), and otherwise bump it along a bit. I guess I should figure out the submission rules for SARE and go that route, but I wanted to post this so there'd be visible feedback for what else I'd like to with more Received data. Be careful with deployment and copy me on feedback please. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ # # LOCAL SPAMASSASSIN RECEIVED HEADER TESTS # v0.1, March 5, 2005 # Eric A. Hall [EMAIL PROTECTED] # # these rules look at data from the Received headers and make some kind # of judgement call # # # NOTE TO SELF: # # See if capitalization checks are needed for comparisons # # # This rule checks for the presence of a domain-literal in the HELO field # of the first X-Spam-Relays-Untrusted psuedo-header. Note SpamAssassin # already tests for raw numeric HELO value, but does not currently test # for domain-literals (eg, '[192.168.0.1]'). # # NOTE TO SELF: Should also add a rule checking for IPv6 literals # describe L_RCVD_HELO_LITERALHELO used a domain literal header L_RCVD_HELO_LITERAL X-Spam-Relays-Untrusted =~ /^[^\]]+ helo=\!((\d{1,3})\.){3}(\d{1,3})\!/ score L_RCVD_HELO_LITERAL 0.1 # # Most messages do not need more than three hops to reach your local server # (eg, client-server-relay-local). Therefore, this rule looks for four # or more entries in the X-Spam-Relays-Untrusted psuedo-header and scores # appropriately if a match is returned. Note that this rule will often hit # legitimate mail messages, so be very careful. # describe L_RCVD_TOO_MANY_HOPS Four or more external hops is suspicious header L_RCVD_TOO_MANY_HOPS X-Spam-Relays-Untrusted =~ /\[ ((.*?) \]){4}/ score L_RCVD_TOO_MANY_HOPS 0.1 # # If there's only one entry in the X-Spam-Relays-Untrusted psuedo-header, # the sending system is likely to be the message originator, which is # somewhat unusual but not impossible (eg, an ecommerce server might # generate a message locally, a mailing-list program might generate an # administrative message locally, or an email network might even be hidden # behind a firewall or gateway, and so forth). # describe L_RCVD_TOO_FEW_HOPSJust one external hop means direct client header L_RCVD_TOO_FEW_HOPS X-Spam-Relays-Untrusted =~ /^(?!\[.*\[).*\[/ score L_RCVD_TOO_FEW_HOPS 0.1 # # This rule looks for an unqualified hostname in the first helo field # of the X-Spam-Relays-Untrusted psuedo-header, and adds a score if a # match is found. # describe L_RCVD_NO_FQDN_HELOUnqualified hostname used in HELO greeting header L_RCVD_NO_FQDN_HELO X-Spam-Relays-Untrusted =~ /^[^\]]+ helo=((.(?!\.))*)\s/ score L_RCVD_NO_FQDN_HELO 0.1 # # This rule looks at the top 'rdns' field of the X-Spam-Relays-Untrusted # psuedo-header, and if the field value is null, it assumes that the # server was unable to resolve the reverse-DNS lookup. Note that this # can happen due to resolver difficulties with the server, delegation # errors at the provider, or any number of other reasons, and should # only be enabled judiciously. # # NOTE TO SELF: this should check if HELO was numeric and skip if so, # but I'm not up to the regexp coding... # describe L_RCVD_HOST_NO_RDNSReceived from host without reverse DNS header L_RCVD_HOST_NO_RDNS X-Spam-Relays-Untrusted =~ /^[^\]]+ rdns=\s/ score L_RCVD_HOST_NO_RDNS 0.1 # # This rule looks at the reverse DNS hostname and checks to see if the same # hostname was used in the HELO identifier. If they are different, score # appropriately. Note that this happens *A LOT* with legitimate mail that # happens to come from poorly-configured networks; be very cautious with # this rule. Also note that an empty 'rdns' field does not cause this # rule to hit. # # NOTE: See below for tests that score extra for mismatched AOL, etc. # describe L_RCVD_HELO_WRONG HELO name and reverse DNS mis-match header L_RCVD_HELO_WRONGX-Spam-Relays-Untrusted =~ /^[^\]]+ rdns=(\S*)(.*?) helo=(?!\1)/ score L_RCVD_HELO_WRONG 0.1 # # The following rules look in the top-most X-Spam-Relays-Untrusted # psuedo-header for HELO identifiers associated with commonly-forged # domains,
Re: basic set of L_RCVD tests
Eric A. Hall wrote: Here's a starter set of strict-SMTP rules, using data from the Received headers as sucked into the X-Spam-Relays-Untrusted psuedo-header. There are tests for unqualified hostname in HELO, domain literal in HELO (SA tests for addresses, but not literals), lack of reverse DNS, mismatched HELO and RDNS, too many hops, not enough hops, and some more stuff that is common spam-sign. A couple of these tests can hit very often against legitimate mail (in particular, there are *A LOT* of SMTP clients that have mismatched HELO/RDNS--including this mailing list's server...) so they all have a default score of 0.1 for safety. OTOH, some of these rules also hit very frequently against spam. I need to monitor them for a while, do some tweaks (see the notes), and otherwise bump it along a bit. I guess I should figure out the submission rules for SARE and go that route, but I wanted to post this so there'd be visible feedback for what else I'd like to with more Received data. Be careful with deployment and copy me on feedback please. FWIW, the L_RCVD_TOO_MANY_HOPS rule will hit on *a lot* of corporate mail. Off the top of my head I can think of at least two dozen large companies that this would hit. I wouldn't be surprised if it hit more ham than spam. Daryl
Re: basic set of L_RCVD tests
On 3/6/2005 1:34 AM, Daryl C. W. O'Shea wrote: FWIW, the L_RCVD_TOO_MANY_HOPS rule will hit on *a lot* of corporate mail. Off the top of my head I can think of at least two dozen large companies that this would hit. I wouldn't be surprised if it hit more ham than spam. That was my fear too, but that's not the case here yet. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: learn_with_whitelist?
At 02:50 PM 3/5/2005, Barrie Slaymaker wrote: Personaly, I find this dangerous. I strongly disagree with the practice of seeding the AWL.. You really shouldn't have to, nor want to. Would you mind explaining why you find this to be dangerous? Because it creates a false method for fixing false positive problems, one that could be easily over-relied upon as a cure-all for FP problems. However: 1) AWL based whitelisting decays over time, becoming less effective as the number of messages received increases. (i.e.: on the first message, it's worth -(100/1)/2 = -50. After 50 messages it's worth (-100/50)/2 = -1. 2) An admin using this method may react to spam by increasing scores, or do other things to the config which would ordinarily cause false positives. The AWL seeding would make the problems non-obvious until a new sender comes along, or the AWL seeding from 1 occurs. Using the AWL as a manual whitelist is a crutch and a hack at absolute best. If you really need a whitelist, use whitelist_from_rcvd entries, or even better, procmail around SA for them.
Re: basic set of L_RCVD tests
At 01:34 AM 3/6/2005, Daryl C. W. O'Shea wrote: FWIW, the L_RCVD_TOO_MANY_HOPS rule will hit on *a lot* of corporate mail. Off the top of my head I can think of at least two dozen large companies that this would hit. I wouldn't be surprised if it hit more ham than spam. Heck, forget corp mail, what about sourceforge.net mailing lists? SF adds 3 hops just by itself.
Re: Quinlan interviewed about SA
On Saturday 05 March 2005 9:54 pm, Eric A. Hall wrote: Yes, my point being that rather than saying they are not useful we really ought to be working hard on finding ways to add more of them, because it is their volume that makes them useful (otoh, having too many of them, such that the bar is lowered, is indeed bad). Ah, but from experience, they *haven't* been useful. SA used to have quite a few negative-scoring rules, and as a result spammers started tailoring their spam to hit them. A rather extreme example would be the series of rules that targeted mail programs that spammers rarely used -- things like Pine, Mutt, Mozilla, etc. The result: spam came through with headers for all three and got a base score of -10. This particular case could be mitigated by adding meta-rules (if it hits more than one UA test, it's obviously forged), but as this sort of thing started happening regularly, the devs began taking out any negative-scoring rules that could be gamed like this. That left the default whitelist, Habeas (after some refinements), Bonded Sender, Hashcash, and Bayes (since it's different for each target system). From what I hear a DomainKeys plugin is in the works. It's not that the SpamAssassin team hasn't thought of the idea, it's that they tried it and, for the most part, it didn't work. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: Quinlan interviewed about SA
At 02:58 AM 3/6/2005, Kelson Vibber wrote: Yes, my point being that rather than saying they are not useful we really ought to be working hard on finding ways to add more of them, because it is their volume that makes them useful (otoh, having too many of them, such that the bar is lowered, is indeed bad). Ah, but from experience, they *haven't* been useful. SA used to have quite a few negative-scoring rules, and as a result spammers started tailoring their spam to hit them. I agree entirely, Kelson speaks true here. Any rule based on simple message content alone can be forged trivially and abused by spammers. I think the big point to get across is we aren't just saying they aren't useful, it's We've been there, done that, and got screwed by the spammers for it. 2.50 shipped with a bunch of negative scoring rules, and it resulted in the completely infamous bug 1589 breaking out: http://bugzilla.spamassassin.org/show_bug.cgi?id=1589 That said, I do personally favor having lots of very small-scoring negative rules (ie: -0.01 each) and set the ham autolearn threshold to -0.01. This prevents a lot of low scoring spam learned as ham problems, as now in order to learn it must hit at least one of the ham rules. Learning as ham any message with a small positive score as per default is just asking for trouble. Keeping the scores of the rules small means they are too trivial to be abused for any significant gain by spammers.
Re: Quinlan interviewed about SA
At 03:16 AM 3/6/2005, Eric A. Hall wrote: But, compare this to something like scoring against TLS encryption strength. Spammers are motivated to send as fast as possible, and strong encryption is counter-productive to that mission (increasingly so), and they can't fake it because it can be validated by a trusted relay. Bah, spammers may be motivated by speed, but they are also opportunists and abuse the resources of others. These days spamming is done via botnets and are almost entirely limited by the bandwidth of the node, not it's CPU time. Adding TLS shouldn't slow them down much, as it's mostly a CPU hit to do so... besides, they can always make up for it by grabbing more infected hosts.
Re: Quinlan interviewed about SA
On Sunday, March 6, 2005, 12:16:50 AM, Eric Hall wrote: But, compare this to something like scoring against TLS encryption strength. Spammers are motivated to send as fast as possible, and strong encryption is counter-productive to that mission (increasingly so), and they can't fake it because it can be validated by a trusted relay. Spammers have access to hundreds of thousands of zombies. They probably have all the computing power they need to calculate a few hashes. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Spam not wrking. Hot to train Bayes and make it wrk in beginning.
But my question which now becomes problem is if a new user has installed spam-3.0.2 and wants it to identify the spam then How to do it quickly? I am wrking on it from last week to make it wrk but things not seems to click. I have run these commands manually on corpus available at spamassassin.org sa-learn --spam spam_2/* sa-learn --ham easy_ham/* then my sa-learn --dump magic is showing this : -bash-2.05b$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 1387 0 non-token data: nspam 0.000 0 1412 0 non-token data: nham 0.000 0 142988 0 non-token data: ntokens 0.000 0 1017036637 0 non-token data: oldest atime 0.000 0 1109068323 0 non-token data: newest atime 0.000 0 1109064441 0 non-token data: last journal sync atime 0.000 0 1109059988 0 non-token data: last expiry atime 0.000 0 22118400 0 non-token data: last expire atime delta 0.000 0 10864 0 non-token data: last expire reduction count But still after sending around 500 spam mails its not identifying the coming mail as spam. More over when i decrease the (required_scores 1 ) then even its not ready to wrk. same thing wrks absolutely fine on 2.63 version but not here. Any or some help welcome Crisppy f. output of spamassassin -D --lint is --- -bash-2.05b$ spamassassin -D --lint debug: SpamAssassin version 3.0.2 debug: Score set 0 chosen. debug: running in taint mode? yes debug: Running in taint mode, removing unsafe env vars, and resetting PATH debug: PATH included '/usr/local/bin', keeping. debug: PATH included '/bin', keeping. debug: PATH included '/usr/bin', keeping. debug: PATH included '/home/admin17/bin', which doesn't exist, dropping. debug: Final PATH set to: /usr/local/bin:/bin:/usr/bin debug: diag: module installed: DBI, version 1.37 debug: diag: module installed: DB_File, version 1.808 debug: diag: module installed: Digest::SHA1, version 2.01 debug: diag: module installed: IO::Socket::UNIX, version 1.21 debug: diag: module installed: MIME::Base64, version 2.21 debug: diag: module installed: Net::DNS, version 0.45 debug: diag: module not installed: Net::LDAP ('require' failed) debug: diag: module not installed: Razor2::Client::Agent ('require' failed) debug: diag: module installed: Storable, version 2.09 debug: diag: module installed: URI, version 1.21 debug: ignore: using a test message to lint rules debug: using /etc/mail/spamassassin/init.pre for site rules init.pre debug: config: read file /etc/mail/spamassassin/init.pre debug: using /usr/share/spamassassin for default rules dir debug: config: read file /usr/share/spamassassin/10_misc.cf debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf debug: config: read file /usr/share/spamassassin/20_body_tests.cf debug: config: read file /usr/share/spamassassin/20_compensate.cf debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf debug: config: read file /usr/share/spamassassin/20_drugs.cf debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf debug: config: read file /usr/share/spamassassin/20_head_tests.cf debug: config: read file /usr/share/spamassassin/20_html_tests.cf debug: config: read file /usr/share/spamassassin/20_meta_tests.cf debug: config: read file /usr/share/spamassassin/20_phrases.cf debug: config: read file /usr/share/spamassassin/20_porn.cf debug: config: read file /usr/share/spamassassin/20_ratware.cf debug: config: read file /usr/share/spamassassin/20_uri_tests.cf debug: config: read file /usr/share/spamassassin/23_bayes.cf debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf debug: config: read file /usr/share/spamassassin/25_hashcash.cf debug: config: read file /usr/share/spamassassin/25_spf.cf debug: config: read file /usr/share/spamassassin/25_uribl.cf debug: config: read file /usr/share/spamassassin/30_text_de.cf debug: config: read file /usr/share/spamassassin/30_text_fr.cf debug: config: read file /usr/share/spamassassin/30_text_nl.cf debug: config: read file /usr/share/spamassassin/30_text_pl.cf debug: config: read file /usr/share/spamassassin/50_scores.cf debug: config: read file /usr/share/spamassassin/60_whitelist.cf debug: using /etc/mail/spamassassin for site rules dir debug: config: read file /etc/mail/spamassassin/local.cf debug: using /home/admin17/.spamassassin for user state dir debug: using /home/admin17/.spamassassin/user_prefs for user prefs file debug: config: read file /home/admin17/.spamassassin/user_prefs debug: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC debug: plugin: registered Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8c0f7dc) debug: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from
Re: mailx vs pine local mail scan times
Bob Proulx wrote: If you can ensure that mail on your network is not forged We use postfix , procmail and spamassassin. I wonder if a header could be added to a mail from postfix when this part of /etc/postfix/main.cf sees a mail as from local? smtpd_recipient_restrictions = permit_mynetworks, The new header could be checked in procmailrc..
Re: Quinlan interviewed about SA
On 3/6/2005 3:25 AM, Matt Kettler wrote: These days spamming is done via botnets That's already trapped by sbl+xbl. Adding TLS shouldn't slow them down much, as it's mostly a CPU hit to do so... There's a lot of stuff involved, and there's lots of things to score on. Here's a couple of samples from DNSOps and Namedroppers: from darkwing.uoregon.edu (darkwing.uoregon.edu [128.223.142.13]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client CN darkwing.uoregon.edu, Issuer Thawte Server CA (verified OK)) by goose.ehsco.com (Postfix ) with ESMTP for [EMAIL PROTECTED]; Fri, 4 Mar 2005 02:18:17 -0600 (CST) Received: from psg.com (psg.com [147.28.0.62]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by goose.ehsco.com (Postfix ) with ESMTP for [EMAIL PROTECTED]; Sat, 5 Mar 2005 16:16:15 -0600 (CST) Did the client present it's own cert? Is it in a trusted path (not self-signed)? Was there an revocation lookup? How tough was the key, and how many bits were used (and scale the score accordingly)? So getting to the higher cumulative scores wouldn't be very simple, and it would also provide a clear path of responsibility, etc. The same thing could be done with user-certs too, if a plug-in to SA wants to do the verification testing. There's also a possibility of having a generic GOOD_BOY set of meta SMTP tests that give a bonus score if they all successfully match against good administrative practives (such as HELO=RDNS). I'm still thinking about this one; the 'professional marketers' would hit this a lot, and there's too many poorly-run networks, so it might be counter-productive. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: basic set of L_RCVD tests
On 3/6/2005 1:26 AM, Eric A. Hall wrote: I need to monitor them for a while, do some tweaks (see the notes), and otherwise bump it along a bit. I guess I should figure out the submission rules for SARE and go that route http://www.rulesemporium.com/forums/showthread.php?s=threadid=105 I've posted a new version that fixes a problem with the L_RCVD_FAKE_* set of rules, which were erroneously matching across boundaries. Future discussion should probably go there, although email is still fine for folks that prefer it. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: mailx vs pine local mail scan times
Rob Fantini wrote: I wonder if a header could be added to a mail from postfix when this part of /etc/postfix/main.cf sees a mail as from local? smtpd_recipient_restrictions = permit_mynetworks, The new header could be checked in procmailrc.. Hmm... I just thought of this on the fly and so there may be something I am not thinking about. But it seems easy enough to do this using postfix's PREPEND option. This requires postfix 2.1 or later. Use a PREPEND to place a new mail header of your choosing. If that header exists then run through spamassassin. If not then you know it is local and can bypass spamassassin. cat /etc/postfix/ext-access.regexp /./ PREPEND X-External-Message: yes Then in the postfix main.cf file: smtpd_recipient_restrictions = permit_mynetworks, reject_unauth_destination, reject_invalid_hostname, reject_non_fqdn_hostname, reject_non_fqdn_sender, reject_non_fqdn_recipient, reject_unknown_sender_domain, reject_unknown_recipient_domain, check_helo_access hash:/etc/postfix/helo-access, check_recipient_access regexp:/etc/postfix/ext-access.regexp, check_sender_access hash:/etc/postfix/client-access, ... reject_rbl_client ...your list here..., ... warn_if_reject reject_rbl_client ...your list here... Then modify the procmail rule to call spamassassin whenever this header is present in the mail. Since all external mail has this header all external mail goes through SA. :0fw * ^X-External-Message: yes | spamassassin This header would not need to be secret. If someone forged it then their mail would be checked by spamassassin. Only the absence of the header could bypass the check. External mail can't avoid it because it is placed there by your external mail relay. The danger would be that someday you modify the postfix rules and this header gets lost. At that time a lot of spam would pass through. But I am sure your users would let you know about that soon enough. Once again let me warn that I have not thought the above through in any great detail. Your suggestion just made me think of this as a way to do what you were wanting. It is not something I care about greatly because I run local mail through spamassassin and splitting out local mail is not really something I will be pursuing. But I did test the above configuration and it worked for me. Bob
Rule_Du_Jour.sh
Hi all, I'm trying to make Rule_Du_Jour.sh spamassassin update rules script working. I have successfully installed spamassassin 3.02 but the Rule_Du_Jour.sh doesn't want to work. This is the debug output produces by the shell : bash ./Rule_Du_Jour.sh exec: /usr/local/curl/bin/curl -w %{http_code} --compressed -O -R -s -S -z /etc/mail/spamassassin/RulesDuJour/rules_du_jour http://sandgnat.com/rdj/rules_du_jour 21 curl_output: 304 -- TRIPWIRE -- RULESET_NAME=TRIPWIRE INDEX=0 CF_URL=http://www.rulesemporium.com/rules/99_FVGT_Tripwire.cf CF_FILE=tripwire.cf CF_NAME=TripWire PARSE_NEW_VER_SCRIPT=perl -ne 'print if /^\s*#.*(vers?|version|rev|revision)[:\.\s]*[0-9]/i;' | sort | tail -1 CF_MUNGE_SCRIPT= Old 99_FVGT_Tripwire.cf already existed in /etc/mail/spamassassin/RulesDuJour... Retrieving file from http://www.rulesemporium.com/rules/99_FVGT_Tripwire.cf... exec: /usr/local/curl/bin/curl -w %{http_code} --compressed -O -R -s -S -z /etc/mail/spamassassin/RulesDuJour/99_FVGT_Tripwire.cf http://www.rulesemporium.com/rules/99_FVGT_Tripwire.cf 21 curl_output: 304 99_FVGT_Tripwire.cf was up to date [skipped downloading of http://www.rulesemporium.com/rules/99_FVGT_Tripwire.cf ] ... Installing new ruleset from /etc/mail/spamassassin/RulesDuJour/99_FVGT_Tripwire.cf.2 Installing new version... TripWire has changed on ABXSmtp1.dsi.abxlogistics.fr. Version line: # Version 1.18 More Typo's fixed. -- EVILNUMBERS -- RULESET_NAME=EVILNUMBERS INDEX=8 CF_URL=http://www.rulesemporium.com/rules/evilnumbers.cf CF_FILE=evilnumbers.cf CF_NAME=EvilNumber PARSE_NEW_VER_SCRIPT=perl -ne 'print if /^\s*#.*(vers?|version|rev|revision)[:\.\s]*[0-9]/i;' | sort | tail -1 CF_MUNGE_SCRIPT= Old evilnumbers.cf already existed in /etc/mail/spamassassin/RulesDuJour... Retrieving file from http://www.rulesemporium.com/rules/evilnumbers.cf... exec: /usr/local/curl/bin/curl -w %{http_code} --compressed -O -R -s -S -z /etc/mail/spamassassin/RulesDuJour/evilnumbers.cf http://www.rulesemporium.com/rules/evilnumbers.cf 21 curl_output: 304 evilnumbers.cf was up to date [skipped downloading of http://www.rulesemporium.com/rules/evilnumbers.cf ] ... Installing new ruleset from /etc/mail/spamassassin/RulesDuJour/evilnumbers.cf.2 Installing new version... EvilNumber has changed on ABXSmtp1.dsi.abxlogistics.fr. Version line: # Version: 1.12s -- SARE_RANDOM -- RULESET_NAME=SARE_RANDOM INDEX=23 CF_URL=http://www.rulesemporium.com/rules/70_sare_random.cf CF_FILE=70_sare_random.cf CF_NAME=SARE Random Ruleset for SpamAssassin 2.5x and higher PARSE_NEW_VER_SCRIPT=perl -ne 'print if /^\s*#.*(vers?|version|rev|revision)[:\.\s]*[0-9]/i;' | sort | tail -1 CF_MUNGE_SCRIPT= Old 70_sare_random.cf already existed in /etc/mail/spamassassin/RulesDuJour... Retrieving file from http://www.rulesemporium.com/rules/70_sare_random.cf... exec: /usr/local/curl/bin/curl -w %{http_code} --compressed -O -R -s -S -z /etc/mail/spamassassin/RulesDuJour/70_sare_random.cf http://www.rulesemporium.com/rules/70_sare_random.cf 21 curl_output: 304 70_sare_random.cf was up to date [skipped downloading of http://www.rulesemporium.com/rules/70_sare_random.cf ] ... Installing new ruleset from /etc/mail/spamassassin/RulesDuJour/70_sare_random.cf.2 Installing new version... SARE Random Ruleset for SpamAssassin 2.5x and higher has changed on ABXSmtp1.dsi.abxlogistics.fr. Version line: # Version: 1.30.14 Attempting to --lint the rules. No files updated; No restart required. Rules Du Jour Run Summary:RulesDuJour Run Summary on ABXSmtp1.dsi.abxlogistics.fr: TripWire has changed on ABXSmtp1.dsi.abxlogistics.fr. Version line: # Version 1.18 More Typo's fixed. EvilNumber has changed on ABXSmtp1.dsi.abxlogistics.fr. Version line: # Version: 1.12s SARE Random Ruleset for SpamAssassin 2.5x and higher has changed on ABXSmtp1.dsi.abxlogistics.fr. Version line: # Version: 1.30.14 ***WARNING***: spamassassin -D --lint failed. Rolling configuration files back, not restarting SpamAssassin. Rollback command is: mv -f /etc/mail/spamassassin/tripwire.cf /etc/mail/spamassassin/RulesDuJour/99_FVGT_Tripwire.cf.2; rm -f /etc/mail/spamassassin/tripwire.cf; mv -f /etc/mail/spamassassin/evilnumbers.cf /etc/mail/spamassassin/RulesDuJour/evilnumbers.cf.2; rm -f /etc/mail/spamassassin/evilnumbers.cf; mv -f /etc/mail/spamassassin/70_sare_random.cf /etc/mail/spamassassin/RulesDuJour/70_sare_random.cf.2; rm -f /etc/mail/spamassassin/70_sare_random.cf; Lint output: debug: SpamAssassin version 3.0.2 debug: Score set 0 chosen. debug: running in taint mode? yes debug: Running in taint mode, removing unsafe env vars, and resetting PATH debug: PATH included '/sbin', keeping. debug: PATH included '/bin', keeping. debug: PATH included '/usr/sbin', keeping. debug: PATH included '/usr/bin', keeping. debug: PATH included '/usr/games', keeping. debug: PATH included '/usr/local/sbin', keeping. debug: PATH included '/usr/local/bin', keeping. debug: PATH included '/usr/X11R6/bin', keeping.
Spamassassin headers. 1. What you can do with Spamassassin headers and 2. how.
How might a provider pull together better, easier ways to explain, instruct a. using less of what might be perceived as jargon and b. using less what might be perceived as arcane references, how might a provider more easily explain, more easily instruct end users with these questions?... end users with areas of expertise not directly applicable to... 1. What can you do with the spamassassin headers?... spamassassin headers have appeared in the headers at the top of messages in emacs rmail. 2. And how do you do the things you can do with the spamassassin headers?... Around the web the instructive material about spamassassin didn't appear to meet the needs of end users, given the usual explanations deferring the matter to another department, deferring the matter to the provider, to the isp, to someone else not me and so on.
Re: Spamassassin headers. 1. What you can do with Spamassassin headers and 2. how.
1. You can delete/move to folder/mark based upon the X-Spam: Yes header 2. RTFM for your email client ;-) each one is different, this is a mailing list for spamassassin, not X e-mail reader. (where X = your email client.) The reason you didn't find anything to help the end user for spamassassin is, spamassassin runs on a server, end users don't deal with servers other than to receive data, in which case they don't need to understand HOW the server works, just that IT does work. At the end of the day, they pay an admin to understand why it and how it works, they need not be worried about the semantics of how... Thanks, JamesDR Don Saklad wrote: How might a provider pull together better, easier ways to explain, instruct a. using less of what might be perceived as jargon and b. using less what might be perceived as arcane references, how might a provider more easily explain, more easily instruct end users with these questions?... end users with areas of expertise not directly applicable to... 1. What can you do with the spamassassin headers?... spamassassin headers have appeared in the headers at the top of messages in emacs rmail. 2. And how do you do the things you can do with the spamassassin headers?... Around the web the instructive material about spamassassin didn't appear to meet the needs of end users, given the usual explanations deferring the matter to another department, deferring the matter to the provider, to the isp, to someone else not me and so on. smime.p7s Description: S/MIME Cryptographic Signature
How to classify mail by numeric IP address of sender or relay?
How to classify (as spam or ham) mail sent from or relayed by a specified numeric IP address of a host or a subnet? Example: Received: from finklfan.com (unknown [222.111.110.107]) by rushmore.scorpionshops.com (Postfix) with SMTP id 1D8207355A for [EMAIL PROTECTED]; Sat, 5 Mar 2005 16:11:57 +0100 (CET) Received: from wamu.com (mtav004.erms-02.wamu.com [167.88.201.35]) by finklfan.com (Postfix) with ESMTP id 13F3E5F8C6 All mail from or relayed by hosts on 167.88.0.0/255.255.0.0 network should be classified (as ham or spam depending on the needs). Also, independently from previous, all mail from or relayed by hosts on 222.111.110.0/255.255.255.0 network should also be classified. Thanks/Mikael
Something new to fool SURBL
As received (relevant snippet): a hrefthrivedhref=http://Taiwanese.com href= http://pickup-card.com;pickup-card.com/a Now here is the SA report on it: X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on nova.terranovum.com X-Spam-Level: * X-Spam-Status: No, score=1.2 required=4.0 tests=BAYES_50,HTML_20_30, HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY autolearn=no version=3.0.0 I can post the original email in it's entirety if anyone needs it. Tom smime.p7s Description: S/MIME Cryptographic Signature
rbl checks, do in postfix or spamassassin
Hellom We use postfix , procmail and spamassassin. Generally which is the better place to do RBL checks, postfix or spamassassin? we're using Gentoo. software versions: mail-filter/spamassassin-ruledujour-20050106 mail-filter/spamassassin-3.0.2-r1 mail-mta/postfix-2.1.5-r2 thanks, Rob
Re: rbl checks, do in postfix or spamassassin
On Sunday 06 March 2005 19:14, Rob Fantini wrote: Hellom We use postfix , procmail and spamassassin. Generally which is the better place to do RBL checks, postfix or spamassassin? Do you want to score on RBLs, or reject on RBLs? If you want to score, SA. If you want to reject outright, Postfix. Moreover, different RBLs can be applied at each stage, so you can reject based on SBL-XBL listings, but just score on SORBs listings.
Re: Spamassassin headers. 1. What you can do with Spamassassin headers and 2. how.
thank you ! i guess... 1. You can delete/move to folder/mark based upon the X-Spam: Yes header What other example or examples are there by way of individual spamassassin hints, tips, pointers, features?... Unrelated, but here's another's example of providing hints, tips, pointers, features http://www.apple.com/pro/tips/
Re: Something new to fool SURBL
A good rule for catching these is: rawbody XMBSHREFv2 /(?!\ba?href=.)(?:\b\w{2,}ref=.)/i I score it at 4.0 for our installation, granted doesn't help with the surbl tagging, but it works well enough at catching these. -Rocky On Sun, Mar 06, 2005 at 02:12:57PM -0500, Thomas Bolioli wrote: As received (relevant snippet): a hrefthrivedhref=http://Taiwanese.com href= http://pickup-card.com;pickup-card.com/a Now here is the SA report on it: X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on nova.terranovum.com X-Spam-Level: * X-Spam-Status: No, score=1.2 required=4.0 tests=BAYES_50,HTML_20_30, HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY autolearn=no version=3.0.0 I can post the original email in it's entirety if anyone needs it. Tom -- __ what's with today, today? Email: [EMAIL PROTECTED] PGP:http://rocky.mindphone.org/rocky_mindphone.org.gpg signature.asc Description: Digital signature
Re: Webmail and IP rules
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Finch writes: On Wed, 2 Mar 2005, Justin Mason wrote: Shane Williams writes: I noticed the HELO_DYNAMIC_* thread and the conclusion that IMP adding a Received header may be a source of problems. I think the problem is being caused by IMP being too good at generating a Received header that looks like a normal one added by an MTA. How is this different from authenticated SMTP submission? Could someone open a bug about this? we may indeed be able to look for the with HTTP and ignore that. That was already added along with the esmtpa/esmtpsa/asmtp protocol tokens. both correct, good points - thx Tony! In fact, HELO_DYNAMIC* et al shouldn't fire for authenticated handovers, and yep, with HTTP implies an auth handover. here's an idea of the patch -- I'm on a plane right now, but someone remind me when they see this mail and I can post the patch to a bug ;) header HELO_DYNAMIC_IPADDR X-Spam-Relays-Untrusted =~ /^[^\]]+ helo=[a-z]\S*\d+[^\d\s]\d+[^\d\s]\d+[^\d\s]\d+[^\d\s][^\.]*\.\S+\.\S+/i becomes: header HELO_DYNAMIC_IPADDR X-Spam-Relays-Untrusted =~ /^[^\]]+ helo=[a-z]\S*\d+[^\d\s]\d+[^\d\s]\d+[^\d\s]\d+[^\d\s][^\.]*\.\S+\.\S+[^\]]+ auth= /i so that auth=HTTP, auth=esmtpa, etc. handovers are ignored. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCKI5bMJF5cimLx9ARAi+KAJ9RtBYcmzN+nQ1dA5LuimvohiU12ACfVM63 oDoEMKv80sdFXxPliXZrT9A= =sjPZ -END PGP SIGNATURE-
Re: What does this mean?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: spamd[29973]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/Plugin/SPF.pm line 207, GEN1460 line 48. Never seen it before and have been running SA3 for a while now. That sounds a lot like a bug in the build of perl you're running, or alternatively in an XS module used by it. However, no XS modules are used in the SPF code, so I'd say perl bug. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCK38qMJF5cimLx9ARAhYtAJ4xHTc2vuKW0gLAqOgeHVi7JCLKswCfXtYn ZShsZGHr31GP4vLmE9wFC3w= =eXQm -END PGP SIGNATURE-
Re: learn_with_whitelist?
Thanks for the informative reply. We have a communal, automatic [1] email address and manual keyword based corporate-wide whitelists that predate our use of SA; I was thinking of factoring out the email address subsystem and replacing it with the AWL as a replacement to simplify the overall system. I'm now planning to use our email keyword whitelist scanners to notify us of any mail that SA generates an FP for. - Barrie [1] it scans our employees' mbox files and whitelists addresses in email we've sent and that we've read and kept; I've only had to remove one or two addresses from it ever.
Spam Report
Hi, Is it possible to enable the full X-Spam-Report field to be added to the message even when the message is not detected as spam. At the moment all my messages detected as spam have X-Spam-Status fields like this: X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on coolserver X-Spam-Level: * X-Spam-Status: Yes, score=13.3 required=3.0 tests=BAYES_99, HELO_DYNAMIC_IPADDR,HTML_20_30,HTML_MESSAGE,MIME_HTML_ONLY, RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,RCVD_IN_NJABL_SPAM, TO_ADDRESS_EQ_REAL,URIBL_OB_SURBL autolearn=no version=3.0.0X-Spam-Report:* 0.0 TO_ADDRESS_EQ_REAL To: repeats address as real name* 4.4 HELO_DYNAMIC_IPADDR Relay HELO'd using suspicious hostname (IP addr 1)* 0.0 HTML_MESSAGE BODY: HTML included in message* 0.2 HTML_20_30 BODY: Message is 20% to 30% HTML* 0.1 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above50%* [cf: 100]* 1.9 BAYES_99 BODY: Bayesian spam probability is 99 to 100%* [score: 1.]* 0.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts* 1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) but my ham messages only have stuff like this: X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on coolserver X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=AWL,BAYES_00,NO_REAL_NAME autolearn=no version=3.0.0 I want to be able to see the full report so I can see why my rules aren't picking up some spams. Tim Edwards
Re: rbl checks, do in postfix or spamassassin
Thank you for the reply, Can someone suggest which RBL checks should probably be result in rejecting mail in postfix? I'll also check in a postfix mail list, but would be interested in some replies from this list.. Duncan Hill wrote: Do you want to score on RBLs, or reject on RBLs? If you want to score, SA. If you want to reject outright, Postfix. Moreover, different RBLs can be applied at each stage, so you can reject based on SBL-XBL listings, but just score on SORBs listings.