Re: Bayes always reject.
> From: Pierluigi Frullani > Date: Wed, 13 Dec 2023 07:49:24 +0100 > > Hello all, > I'm facing a strange problem. ... > tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE How did you feed this message into SpamAssassin? Did you do something to strip off all of the email headers? For the BAYES_99, as already mentioned you probably need to retrain bayes, making sure to correct any incorrectly trained email messages. -jeff
Re: BAYES scores
> From: joe a > Date: Tue, 28 Feb 2023 11:37:34 -0500 > > Curious as to why these scores, apparently "stock" are what they are. > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. > > Noted in a header this morning: > > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% > * [score: 1.] > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% > * [score: 1.] > > Was this discussed recently? I added a local score to mollify my sense > of propriety. Those two rules overlap. A message with bayes >= 99.9% hits both rules. BAYES_99 ends at 1.00 not .999. -jeff
Re: Hits on item with " No description available"
Greg Troxel writes: > From: Greg Troxel > Date: Thu, 20 Jan 2022 16:32:53 -0500 > > I followed my own advice about egrep -R and found this immediately > > it's in > > 3.004006/updates_spamassassin_org/72_active.cf > > and it is > > ##{ FSL_HELO_NON_FQDN_1 > header FSL_HELO_NON_FQDN_1 X-Spam-Relays-External =~ /^[^\]]+ > helo=[a-zA-Z0-9-_]+ /i > ##} FSL_HELO_NON_FQDN_1 > > with score > > score FSL_HELO_NON_FQDN_1 2.361 0.001 1.783 0.001 BTW: You can create tags (using Exuberant ctags) for spamassassin rules: I create the tags using: ctags -f SPAMASSASSIN_TAGS --langdef=CF --langmap=CF:.cf --languages=CF --regex-CF='/^[ \t]*(header|mimeheader|describe|body|rawbody|full|meta|uri|urirhssub|uridnsbl|urirhsbl|tflags|score|replace_rules)[ \t]+([^ \t]+)/\2/' ~/.spamassassin /var/lib/spamassassin /usr/share/spamassassin So, I can do Meta-. in Emacs and it goes directly to the 'header FSL_HELO_NON_FQDN_1' definition -jeff
Re: DCC whitelisting
From: sha...@shanew.net Date: Thu, 11 Jun 2015 10:02:59 -0500 (CDT) On Wed, 10 Jun 2015, John Hardin wrote: > On Wed, 10 Jun 2015, Shane Williams wrote: > >> Two examples that I know are legitimate senders, but get caught by DCC >> (and pyzor in some cases) and other rules that push them over the >> threshold are the SourceForge.net Project of the Month list and >> various Netflix emails to customers (New Arrivals or "we just added a >> show you might like"). In both those cases, the user part of the >> env_from changes, and as I understand it, the DCC Whitelist doesn't >> allow wildcards, so I can't have an entry that matches the server >> part. Maybe I could be using the "substitute List-ID:" syntax, but >> neither of those has List-ID as a specific header. > > Can you reliably identify those at the MTA level and tell the SA glue to skip them entirely? I probably could, but that also seems kludgy. DCC has a whitelisting capability, so why not use it? Am I misunderstading what DCC's whitelist is intended for? There are numerous ways to whitelist messages in DCC The easiest is to whitelist by mail_host, eg ok substitute mail_host ecerts.americanexpress.com you put the entries in /var/dcc/whiteclnt (or wherever you have the files installed). The mail_host is the stuff after the @ in the return-path header. You can test the entry by calling dccproc with the full email message, eg: /usr/local/bin/dccproc -d -H -Q -S mail_host -S Sender -S List-ID -S From -l ~/.dcc -w /var/dcc/whiteclnt -R < put_your_email_message_filename_here You may need to change dcc_conf to make sure that mail_host is included at startup DCCIFD_ARGS="-SHELO -Smail_host -SSender -SList-ID -SFrom" You can also look at the proof of concept dcc scripts on http://www.rhyolite.com/dcc/ CGI Demonstration There is a demonstration of the proof of concept CGI scripts that allow users to maintain individual whitelists and monitor individual logs of rejected mail at http://www.rhyolite.com/dcc-demo-cgi-bin/ or http://cgi-demo:cgi-d...@www.rhyolite.com/dcc-demo-cgi-bin/. It requires a user name of cgi-demo and a password of cgi-demo the same as the user name. -jeff
Re: effectiveness of DCC checks?
From: Quanah Gibson-Mount Date: Tue, 14 Apr 2015 10:59:28 -0700 I've noticed that DCC_CHECK is flagging on tons of items that are clearly not spam. The most recent hit for me today was a release announcement from the mariadb folks. Overall, it's a trend I'm routinely seeing where it is flagging a lot of email that clearly isn't spam. Are others who use DCC seeing similar issues? --Quanah You need to whitelist bulk senders in DCC. See the DCC manpage: dcc(8) - Ubuntu Manpage Whitelists are the responsibility of DCC clients, since only they know which bulk mail they solicited. The only false positives (mail marked as "bulk" by a DCC ... -jeff
Re: SpamRATS RBL?
From: "Kevin A. McGrail" Date: Wed, 18 Mar 2015 10:21:39 -0400 Anyone use this RBL or familiar with it? Pros/cons? Efficacy data? regards, KAM I get 5% spam hits on DYNA and 10% on NOPTR. The SPAM list isn't that great (< 1% spam and some false hits). -jeff
Re: Rule to match a blacklist of email addresses.
From: Steve Date: Sat, 10 Jan 2015 14:23:36 + I have a domain for which (for historic reasons) I want a catch-all rule to accept email. Until recently, Spamassassin has done a great job of separating the ham from the spam. Recently, I've been receiving a large number of spam emails which have been misclassified as ham. These annoying spam emails tend to be addressed to a relatively small number of email addresses at my domain - addresses which have never been used/provided, so should be a very strong indicator of spam. If I were to have a list of a few dozen email addresses of the form: bogus_us...@mydomain.com onlyspample...@mydomain.com ... unwantedrubb...@mydomain.com What is the easiest way to implement a rule that checks against such a list - and ups the spam-score if matched? Would I have to implement a separate rule for each address? use blacklist_to bogus_us...@mydomain.com ... This will lead to hits on USER_IN_BLACKLIST_TO -jeff
Re: Spam messages bypassing SA
From: Bob Proulx Date: Mon, 27 Oct 2014 18:37:35 -0600 In the first email: # The lock file ensures that only 1 spamassassin invocation happens # at 1 time, to keep the load down. # :0fw: spamassassin.lock * < 40 | spamc -x Kevin A. McGrail wrote: > geoff.spamassassin140903 wrote: > > Kevin A. McGrail wrote: > > > Using procmail without MTA glue is OK for many uses. I am wondering how > > > many spamd connections you allow and if you have checked your logs? > > > > > > I also cannot remember but the uses of a lock file seem odd for > > > something that can thread. Any one know if that is a good idea to > > > remove? > > > > I wonder if you could explain in simple terms what the lockfile achieves > > in this situation? Is it even possible that it could cause messages to > > bypass SA? > > I don't think a lockfile achieves anything because it's a call to a program. > Procmail has some weird syntax so hopefully someone with some procmail-fu > can tell us if a lock on a procmail system call does anything. Well... The comment in the example explains what the lock is attempting to do. I think that comment got missed in the follow-ups. The lock will restrict spamassassin invocations to one at a time to prevent a high system load average running too many spamassassin processes all at once. It will serialize spamassassin invocations to one at a time instead of many in parallel. Normally the MTA will receive incoming messages and will fork a process for each incoming connection. If the outside world connects and sends 100 messages all at once then there will be 100 MTA processes running in parallel. If 10,000 all at once then probably some MTA process limit will prevent forking that many depending upon your configuration. Each of those will try to send the message through procmail and spamassassin in parallel too. Running 10,000 procmail processes in parallel probably won't be a problem since it is light weight. However running perl spamassassin 100 or 1,000 times in parallel all at once can be quite a resource hit to a moderate system! By putting the lock in the procmail rule it prevents more than one perl spamassassin process from running at a time. This keeps the system from being overloaded due to a spike from the outside world. I want to emphasize that the outside world impacts the system and can have an effect of a DDoS just by overwhelming the system with external connections. The MTA has limits to prevent this but while those are tuned for normal delivery the MTA maintainers won't know if you are running each message through spamasassin and causing a higher load because of it. The default MTA limits are probably too high when considering running the message through spamassassin too. The procmail example comes from the wiki page example: http://wiki.apache.org/spamassassin/UsedViaProcmail The wiki page example is launching "spamassassin" not "spamc". That is an important difference to this case. Someone has changed that to spamc in the above and preserved all else including the serialization lock. The spamc talks to a spamd and so the number of parallel processes spamd can handle depends upon the spamd configuration. In the spamc use I would be inclined to remove the serialization lock. Let it be throttled at the spamd side of things instead. That would make the most sense to me. Then tune spamd's limits as needed. In summary I suggest removing the serialization lock from the spamc recipe. Give it a try and monitor system resource utilization. Start tuning at spamd. Tune other things as needed afterward. :0fw | spamc -x :0e { EXITCODE=$? } Bob I agree with everything you wrote but only when bayes autolearning is turned off. Bayes learning holds an exclusive lock to the bayes database particularly during expiration. If spamc does bayes autolearning and starts an expiration then other spamc runs for that user will be locked out of bayes. At some point you start getting timeouts at different points in the email delivery chain. I have a separate sa-learn (or spamc -L) procmail recipe that has a serialization lock. -jeff
Re: Philosophical question on Bayes (was Re: 23_bayes_ignore_header.cf)
From: Axb Date: Tue, 14 Oct 2014 23:37:36 +0200 On 10/14/2014 11:08 PM, Adam Katz wrote: >> On Tue, 14 Oct 2014 16:10:52 +0200 Axb wrote: >>> and to avoid further discussions of what header may pollute bayes or >>> not, I've removed all header entries which are not directly related >>> to AV/filter products. > > On 10/14/2014 07:17 AM, David F. Skoll wrote: >> I'm not sure I agree with being too clever about Bayes. Surely by its >> very nature, the Bayes algorithm will itself indicate which tokens >> are relevant and which are not? Isn't that the whole point of Bayes? >> >> I think being to clever about massaging the data that gets fed to >> Bayes may be counter-productive. For sure, *some* massaging is in order; >> a token should be a semantic unit, so something like "www.example.com" >> should probably be one token rather than three, but beyond that I wonder >> if it's good or not to massage the data? > > The purpose of bayes_ignore_header is twofold: > > 1. Prevent inheriting other systems' false positives (ensure better > independence) > 2. Prevent relying upon headers that won't exist at delivery time (e.g. > added by the mailbox server) > > This is why it's so important to ignore other spam engines, which > basically fit into both of those categories. I'd love to have the option (switch) to use Bayes on msg bodies ONLY, though I doubt anybody would be a taker for such a project. (I'd even be willing to "$pon$or" such an addition to SA) Wouldn't that be fairly easy to implement by intercepting the call to _tokenize_headers in Plugin/Bayes.pm? # Tokenize the headers my %hdrs = $self->_tokenize_headers ($msg); while( my($prefix, $value) = each %hdrs ) { push(@tokens, $self->_tokenize_line ($value, "H$prefix:", 0)); } -jeff
Re: Bayes Problem
From: Julian Brown Date: Thu, 28 Aug 2014 10:46:55 -0500 I work for a company that has lots of mail users. We use Exim with Spamassassin. My job is to track down this problem. We are getting complaints of too much spam and have tracked it down, using Google, to our bayes files not working correctly. I do not know if they are poisoned or just not working. When bad spam gets through it is always the same, BAYES_00 -1.9 in the headers. According to what I have googled there is only one thing we can do and that is to clear the bayes filters and either allow it to start again and possibly retrain. Each individual has their own bayes filters, /home/user/.spamassassin/bayes_*. Exim version 4.82 #2 built 17-Jul-2014 13:21:53 SpamAssassin Server version 3.3.2 CentOS 6.5 64bit But we are getting a lot of it, not all accounts, so I think this means we are getting poisoned or something they are doing is rendering the bayes filters non functional. Here is from one of them from a week or 2 ago: sa-learn --dump magic 0.000 0476 0 non-token data: nspam 0.000 0 40270 0 non-token data: nham ... I don't know the significance of the above readout, but all the discussions talk about this. Julian You need to learn way more spam messages. You will get the best results by learning from essentially all messages, as long as the messages are learned correctly. In addition to not having enough spam messages you probably have learned various spam messages as ham. -jeff
Re: New at SpamAssassin - how to not get headers
From: RobertGrimes Date: Tue, 5 Aug 2014 08:50:44 -0700 (PDT) I don't know if this is fair to ask, but would you (or anyone) care to see if the message I am posting should be rated higher than 1.9? I appologize if this is not appropriate. The message is at http://pastebin.com/UZeDtLWZ You need to save the complete original message. Many of the headers are missing. MISSING_DATE=0.1,MISSING_MID=0.497,NO_RECEIVED=-0.001,NO_RELAYS=-0.25 With sufficient training you should be able to get BAYES_99 + BAYES_999 -jeff
Re: getting tons of SPAM
From: John Hardin Date: Wed, 2 Jul 2014 14:45:07 -0700 (PDT) On Wed, 2 Jul 2014, motty cruz wrote: > bayan filter is not running: according to header, > > X-Virus-Scanned: amavisd-new at fqdn.com > X-Spam-Flag: NO > X-Spam-Score: -0.009 > X-Spam-Level: > X-Spam-Status: No, score=-0.009 tagged_above=-999 required=5.3 >tests=[HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01] >autolearn=unavailable > Received: from > > # sa-learn --dump magic > Error Opening file /usr/local/share/GeoIP/GeoIPv6.dat > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 3338 0 non-token data: nspam > 0.000 0784 0 non-token data: nham > > any ideas? Note the "autolearn=unavailable" part. The Bayes database is probably locked doing an expire. Also, the GeoIP data file should be fixed: Error Opening file /usr/local/share/GeoIP/GeoIPv6.dat You need to post samples (to pastebin). We can't make comments on what *should* be hitting unless we can see the message itself. Yep. -jeff
Re: whitelist_from_spf dbg
From: Matus UHLAR - fantomas Date: Mon, 19 May 2014 15:44:30 +0200 > On 17.05.14 14:11, Jeff Mincy wrote: > >It would have been easier to figure out why it was matching if the > >matching spf entry was printed out, for example something like this: > > > >May 8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry > >May 8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and passed SPF check > From: Matus UHLAR - fantomas > Date: Sun, 18 May 2014 18:22:49 +0200 > According to the documentation, they are not regexp's (as one could/should > expect): > >Whitelist and blacklist addresses are now file-glob-style patterns, On 18.05.14 13:44, Jeff Mincy wrote: >The matching whitelist_from_spf entry *@*buy.com is a file glob pattern >which matched. I'm not sure why you are quoting the manual here. The >whitelist entry *@*buy.com is turned into a regexp by add_to_addrlist >in SpamAssassin/Conf/Parser.pm which among other things does s/\*+/\.\*/g I wanted to point out that you (and many other people) could be surprised what you see in the regexp, because the glob-style pattern you enter into blacklist/whitelist directive. Maybe if not the RE, but the directive content was shown in the debug output... Sure, printing out the original glob would be better. The original glob isn't currently saved - it would be a little more work. I could come up with other ideas - such as returning the information in a tag that could be added to a header. > I assume the contents of *_networks is modified before RE matching, so you'd > wonder what is the content... >Ok, you lost me. What does the contents of *_networks have to do with >the suggestion to print the matching whitelist regexp entry? Nothing >matching *buy.com has been added to *_networks if that is what you are >wondering. sorry, that had to be (black|white)list_*, not *_networks. Ah. Yes, the glob style whitelist was modified into a regexp before matching. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Remember half the people you know are below average. -jeff
Re: whitelist_from_spf dbg
From: Matus UHLAR - fantomas Date: Sun, 18 May 2014 18:22:49 +0200 On 17.05.14 14:11, Jeff Mincy wrote: >I just got some spam that was erroneously spf whitelisted hitting WHITELIST_FROM_SPF >It took me a while to figure out why it was getting WHITELIST_FROM_SPF >but I eventually tracked it down down to this whitelist entry: > whitelist_from_spf *@*buy.com >The *@*buy.com (obviously) matches *@odysseyshop.ribsbuy.com. > >It would have been easier to figure out why it was matching if the >matching spf entry was printed out, for example something like this: > >May 8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry >May 8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and passed SPF check According to the documentation, they are not regexp's (as one could/should expect): Whitelist and blacklist addresses are now file-glob-style patterns, The matching whitelist_from_spf entry *@*buy.com is a file glob pattern which matched. I'm not sure why you are quoting the manual here. The whitelist entry *@*buy.com is turned into a regexp by add_to_addrlist in SpamAssassin/Conf/Parser.pm which among other things does s/\*+/\.\*/g >sub _wlcheck { > my ($self, $scanner, $param) = @_; > if (defined ($scanner->{conf}->{$param}->{$scanner->{sender}})) { >return 1; > } else { >study $scanner->{sender}; >foreach my $regexp (values %{$scanner->{conf}->{$param}}) { > if ($scanner->{sender} =~ qr/$regexp/i) { >##New dbg output here: >dbg("spf: $param: $scanner->{sender} matches $regexp entry"); >return 1; I assume the contents of *_networks is modified before RE matching, so you'd wonder what is the content... Ok, you lost me. What does the contents of *_networks have to do with the suggestion to print the matching whitelist regexp entry? Nothing matching *buy.com has been added to *_networks if that is what you are wondering. -jeff
whitelist_from_spf dbg
I just got some spam that was erroneously spf whitelisted hitting WHITELIST_FROM_SPF It took me a while to figure out why it was getting WHITELIST_FROM_SPF but I eventually tracked it down down to this whitelist entry: whitelist_from_spf *@*buy.com The *@*buy.com (obviously) matches *@odysseyshop.ribsbuy.com. It would have been easier to figure out why it was matching if the matching spf entry was printed out, for example something like this: May 8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry May 8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and passed SPF check sub _wlcheck { my ($self, $scanner, $param) = @_; if (defined ($scanner->{conf}->{$param}->{$scanner->{sender}})) { return 1; } else { study $scanner->{sender}; foreach my $regexp (values %{$scanner->{conf}->{$param}}) { if ($scanner->{sender} =~ qr/$regexp/i) { ##New dbg output here: dbg("spf: $param: $scanner->{sender} matches $regexp entry"); return 1; } } } return 0; } -jeff
Re: help with regex
From: "Kevin A. McGrail" Date: Wed, 26 Feb 2014 19:06:34 -0500 On 2/26/2014 6:53 PM, Webmaster wrote: > I need a regex to match an alphanumeric string with letters and numbers. > > example: 48HQZBF404TY2298D1414BB8050022YQ3872444 > > The pattern is defined as: > > A sequence of alphanumeric characters, letters are upper or lower > case, at least 30 chars long, containing at least 10 numbers. > > This part is easy enough: [a-zA-Z0-9]{30,} > > But I can't figure out how to match only ifthe string contains at > least 10 numbers. Hmm, I think you might need a plugin for that one. Can't you do something like this using a look ahead regexp? (?=[A-Z0-9]{30,})(?:[A-Z]*[0-9]){10,} The look ahead gets the 30 chars. Then the next part gets the 10 or more numbers. You probably don't need unbounded {10,} but you do need the {30,} part to be unbounded. Is the 10 number part really important? -jeff
Re: re-learning ? was - bayes - large message
From: "Joe Acquisto-j4" Date: Sat, 20 Apr 2013 09:10:26 -0400 >>> On 4/19/2013 at 8:33 PM, "Joe Acquisto-j4" wrote: On 4/19/2013 at 8:26 PM, "Joe Acquisto-j4" wrote: >> I thought I had corrected this issue, with someone's assistance, a while > ago: >> >> Apr 19 20:21:02.477 [23670] dbg: bayes: expiry completed >> Apr 19 20:21:02.477 [23670] info: archive-iterator: skipping large message >> Learned tokens from 0 message(s) (0 message(s) examined) > > Please ignore. As much as possible. I was testing manually and forgot > --mbox on the command line. > > However, I can see something is amiss as it is happily accepting spam I > thought had been previously submitted. > > joe a. Ok, I am officially puzzled. I setup email addresses on my SA box, to which I and others (they say) send ham/spam. Then I have cron tasks that feed those emails twice daily to bayes. And emails the output to my admin mailbox. I can review those admin messages and see "Learned tokens from n message(s) (n message(s) examined)". Yet, if i resend the bayes food from those dates, it appears to re-learn them. I would expect "Learned tokens from 0 messages(s) (n messages(s). . . " if it already had seen them. I have tried this for several dates and get the same result. What could it be? Not Operator Trouble, surely . . . joe a Bayes uses the message id from the email message to remember which messages it has seen. If you are really emailing the messages then you are getting a new message-id which is then learned. You need to train on the unadulterated original email message. You can do this by attaching the complete email message. Otherwise you are training bayes to recognize tokens added by your users during the forwarding process as a spam indicator. -jeff
Re: rdns in received header
From: Matus UHLAR - fantomas Date: Thu, 21 Feb 2013 16:36:18 +0100 >On 2/21/2013 9:03 AM, Jeff Mincy wrote: >>Well, I trust the network not to lie. This is more of an omission On 21.02.13 10:26, Kevin A. McGrail wrote: >Your Clinton-esque logic likely doesn't apply here ;-). The land of >RFC's works to avoid this type of logic in a language I call >RFC-eeze. as long as I understan Jeff's original mail, the issue is that his ISP stopped providing DNS information in the Received: headers. SA does not do lookups on the IPs in Received: (there's iirc one exemption related to a buggy software) and if it's not there, it assumes the rDNS does not exist, while it does. Actually the ISP added a completely new hop, and that hop is not adding rDNS to the received header. I had to add the new hop to trusted_networks and internal_networks. The new hop looks like it is scanning the messages using Cloudmark: X_CMAE_Category: ... X-CNFS-Analysis: ... X-CM-Score: ... X-Scanned-by: Cloudmark Authority Engine >>I could always whine to Rcn about it, maybe they'll fix it. >I think that's a good move to at least try! It truly sounds more >like a DNS error that they might know be are is occurring. if the error repeats, I assume Jeff's guess is correct and the ISP just turned rDNS lookups off. Or neglected to turn on the lookups in the first place... -jeff
Re: rdns in received header
From: "Kevin A. McGrail" Date: Thu, 21 Feb 2013 11:07:20 -0500 On 2/21/2013 10:36 AM, Matus UHLAR - fantomas wrote: > And how is this ISP's issue related to RFCs? The RFC does not mention > word > "trusted" A fair point that I didn't explain clearly enough. The RFCs cover received headers for SMTP and RFCs strive to be black and white. Discussing things as gray area is an argument that Bill Clinton was famous for but doesn't really hold a place in discussing technology covered by Which RFC talks about Received headers having rDNS or what information is supposed to be in the received header? The point of SA's trusted configuration is that you "trust" the headers. In this case, he's saying he doesn't trust the headers because they are omitting important information but that they aren't lying, just lying by ommissions. To me, this says "I can't trust those headers" and you need to pull back your trust circle which in this case will ruin much of the rules SA uses for pathway analysis (RBLs, rDNS, etc.) Fixing those headers outside SA or fixing the ISP creating those headers are the real solutions. There is of course a third option for me - I could turn off the spam filtering on Rcn email. Most of the spam is blocked by Rcn, there's almost no point in trying to filter what little spam is left. -jeff
Re: rdns in received header
From: "Kevin A. McGrail" Date: Thu, 21 Feb 2013 08:46:40 -0500 On 2/20/2013 8:51 PM, Jeff Mincy wrote: > ... > > This leads to various bad things (RDNS_NONE & broken WHITELIST_FROM_RCVD) > > Is there anything in SpamAssassin that can deal more elegantly with > this particular problem? Perhaps Some sort of please_fill_in_rcvd_rdns > type option? Off the cuff, the point of trusted networks is to say you trust that network's headers. However, in this case, you don't... I don't really know a fix for this because we have enough issues parsing received headers, let alone re-writing them. Well, I trust the network not to lie. This is more of an omission How good is your perl and maybe you can solve it in MIMEDefang before it's sent to SA? Yea, I expected this was going to be the answer. It would have to be a procmail filter that calls out to a script. Yuck. Thanks for confirming my suspicion. I could always whine to Rcn about it, maybe they'll fix it. -jeff
rdns in received header
My local ISP (rcn.com) reconfigured their email servers. The 69.168.97.77 hop does not seem to be doing rdns lookups on the previous hop. For example, I get these two received headers at the trust boundary: ... Received: from mx.rcn.com ([69.168.97.77]) by mx06.atw.mail.rcn.net with ESMTP; 20 Feb 2013 17:07:22 -0500 ...trust/internal boundary... Received: from [216.33.63.216] ([216.33.63.216:56326] helo=bigfootinteractive.com) by mx.rcn.com (envelope-from <1709130a2layfovcia3kqqzqabnxydzhs2jc2h4yaa...@mail.ameriprise.com>) (ecelerity 2.2.3.49 r(42060/42061)) with ESMTP id 29/DB-26250-A1945215; Wed, 20 Feb 2013 17:07:22 -0500 ... and the relays are parsed as X-Spam-Relay: Trusted= ...[ ip=69.168.97.77 rdns=mx.rcn.com helo=mx.rcn.com by=mx06.atw.mail.rcn.net ident= envfrom= intl=1 id= auth= msa=0 ] Untrusted=[ ip=216.33.63.216 rdns= helo=bigfootinteractive.com by=mx.rcn.com ident= envfrom=1709130a2layfovcia3kqqzqabnxydzhs2jc2h4yaa...@mail.ameriprise.com intl=0 id=29/DB-26250-A1945215 auth= msa=0 ] ... This leads to various bad things (RDNS_NONE & broken WHITELIST_FROM_RCVD) Is there anything in SpamAssassin that can deal more elegantly with this particular problem? Perhaps Some sort of please_fill_in_rcvd_rdns type option? I'm still on 3.2.5 (yes I know it is old). -jeff
Re: X-Relay-Countries
From: Mike Grau Date: Tue, 12 Feb 2013 14:18:33 -0600 > Hmm I would do something like this (untested): > > header RELAY_NOT_US X-Relay-Countries =~ /\b(?!US)[A-Z]{2}\b/ I've had to use, IIRC. X-Relay-Countries =~ /\b(?!US|XX)([A-Z]{2})\b/ XX means unknown, mostly due to stale database. You can update the IP::Country database. See: http://wiki.apache.org/spamassassin/RelayCountryPlugin -jeff
Re: Spamassassin not parsing email messages
From: Sean Tout Date: Fri, 28 Dec 2012 01:10:02 -0800 (PST) Hi Henrik, Thank you much for the prompt response and points. I ran the Perl script with the code you pasted below, but still got the same report scores for all emails! by the way, when I also tried to print contents of the emails using $status->get_content_preview(), I got [...] I'm unable to print any portions of the email messages using $status = $spamtest->check($mail), however I can print any portions using $folder_reader->read_next_email(). Regards, Sean. Based on the tests that are hit -- -0.0 NO_RELAYS Informational: message was not relayed via SMTP 1.2 MISSING_HEADERSMissing To: header 0.1 MISSING_MIDMissing Message-Id: header 1.8 MISSING_SUBJECTMissing Subject: header 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no Subject: text -0.0 NO_RECEIVEDInformational: message has no Received headers 1.4 MISSING_DATE Missing Date: header 0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822 you are passing in malformed email messages into SpamAssassin. SpamAssassin can not find any of the headers. I'd guess that you have extraneous junk at the beginning of each message. -jeff
Re: BAYES_00
From: Arthur Dent Date: Sat, 06 Oct 2012 11:03:18 +0100 Hello all, Following a hard drive crash I am rebuilding my small home server on a Fedora17 platform. One of the casualties of the HD crash was my spam corpus. I had a (very old) backup which happened to include a previous spam corpus so I used that to sa-learn. All my messages hit BAYES_00. I don't have many "fresh" spams. I do not run a SMTP server, I simply collect mail for my family and myself from my ISP and other sources using fetchmail. My ISP seem to filter most of the really bad stuff so I get just a trickle of spams (about 1 per day - if that) but even those hit BAYES_00 despite sometimes being identical to a previous FN that had already been learned with sa-learn. Here is my --dump magic: ... What - if anything - can I do to improve bayes performance? Get more spam? Bayes really isn't going to do well with limited amount of spam. It does great when correctly trained using lots of spam. But with limited data, not so much. You could try starting over. It will take 6 months or so to get to 200 spam messages if you are really getting about 1 per day. You could just turn off Bayes. Or you could just turn Bayes off. I'm almost at the same point with my home email, for the same reason. -jeff
Re: Very spammy messages yield BAYES_00 (-1.9)
From: Ben Johnson Date: Wed, 15 Aug 2012 13:36:08 -0400 Some 99% of the spam that I receive, which is grossly spammy (we're talking auto loans, cash advances, dink pills, the whole lot) contains "BAYES_00=-1.9" in the tests portion of the X-Spam-Status header. Might anyone know why? This is a stock installation (Ubuntu package on 10.04). Most likely you've let autolearn learn a large number of spam messages as ham. Any autolearn mistakes need to be corrected. One or two spam messages with BAYES_00 is not a problem, but a large number of them indicates a serious problem with learning. If you have the old spam messages then you can retrain correctly. Otherwise it would probably be best to start over by deleting the bayes database. local.cf contains # Bayesian classifier auto-learning (default: 1) # # bayes_auto_learn 1 and I have not overridden the default elsewhere. So, presumably, auto-learning is enabled (if that's event relevant). While I have not trained the Bayesian filter manually to date, how is it that the spammiest of the spam is being classified with BAYES_00 (thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the message is almost certainly not spam? Yes, BAYES_00 says the spam probability is between 0 and 1%. http://forums.eukhost.com/f38/problems-spamassassin-bayes-filter-16948/ Outside of the above forum post, search query results for this issue are scant. There have been numerous posts on BAYES. -jeff
Re: USER_IN_WHITELIST and SPF_FAIL
From: RW Date: Tue, 19 Jun 2012 23:43:57 +0100 On Tue, 19 Jun 2012 18:02:28 -0400 Jeff Mincy wrote: >From: John Hardin >Date: Tue, 19 Jun 2012 14:44:29 -0700 (PDT) > >On Tue, 19 Jun 2012, Benny Pedersen wrote: > >> Den 2012-06-19 22:39, Kevin A. McGrail skrev: >> >>> I think that's the concept behind the whitelist_from_spf >> >> but some use whitelist_from, its nothing new there :=) >> >> can user_in_whitelist be changed to not have -100 as default >> score, or is whitelist_from planned for removements ? > >It's needed for whan none of the other more-strict whitelist > options will work, so we can't get just rid of it. > > True. > >I'd suggest instead a lint warning if it is used, alerting the > admin that it's discouraged and that it has problems like this and is > very easy to spoof. > > How about creating a different score for whitelist_from that is > separate from whitelist_from_rcvd? For example, whitelist_from could > trigger USER_IN_SIMPLE_WHITELIST (or some other variation). The > description of the test could include warnings about how easy > it is to spoof whitelist_from. If used sensibly USER_IN_WHITELIST is probably the most reliable rule we have, for the overwhelming majority of addresses it's far more accurate than spf based whitelisting. It's not always right to treat users as idiots. Huh? What you mean by used sensibly? whitelist_from_rcvd is very reliable. whitelist_from is trivial to spoof. whitelist_from_rcvd and whitelist_from both trigger USER_IN_WHITELIST. It is easy to get into trouble using whitelist_from - having a separate score just for whitelist_from would make identifying the problem easier for the user. -jeff
Re: USER_IN_WHITELIST and SPF_FAIL
From: John Hardin Date: Tue, 19 Jun 2012 14:44:29 -0700 (PDT) On Tue, 19 Jun 2012, Benny Pedersen wrote: > Den 2012-06-19 22:39, Kevin A. McGrail skrev: > >> I think that's the concept behind the whitelist_from_spf > > but some use whitelist_from, its nothing new there :=) > > can user_in_whitelist be changed to not have -100 as default score, or is > whitelist_from planned for removements ? It's needed for whan none of the other more-strict whitelist options will work, so we can't get just rid of it. True. I'd suggest instead a lint warning if it is used, alerting the admin that it's discouraged and that it has problems like this and is very easy to spoof. How about creating a different score for whitelist_from that is separate from whitelist_from_rcvd? For example, whitelist_from could trigger USER_IN_SIMPLE_WHITELIST (or some other variation). The description of the test could include warnings about how easy it is to spoof whitelist_from. -jeff
Re: Whitelisting with DKIM
From: Alex Date: Mon, 31 Oct 2011 12:18:33 -0400 I have a fedora15 system with sa-3.3.2 and amavisd-2.6.6 and would like to whitelist messages like these: Oct 31 11:19:42 mail02 amavis[3518]: (03518-01-20) SPAM-TAG, -> <50...@example.com>, No, score=-4.555 tagged_above=-100 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_IMAGE_RATIO_04=0.61, HTML_MESSAGE=0.001, KHOP_RCVD_TRUST=-1.75, LOC_SHORT=0.6, I've enabled dkim in amavisd.conf: $enable_dkim_verification = 1; # enable DKIM signatures verification $enable_dkim_signing = 1;# load DKIM signing code, keys defined by dkim_key ... Oct 31 11:29:04.733 [7571] info: rules: meta test L_UNVERIFIED_GMAIL has dependency 'DKIM_VERIFIED' with a zero score Oct 31 11:29:04.837 [7571] dbg: check: tests=DKIM_SIGNED,DKIM_VALID,HTML_IMAGE_RATIO_04,HTML_MESSAGE,KHOP_RCVD_TRUST,LOC_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RCVD_IN_IADB_DK,RCVD_IN_IADB_LISTED,RCVD_IN_IADB_OPTIN,RCVD_IN_IADB_RDNS,RCVD_IN_IADB_SPF,RCVD_IN_UCEPROTECT2,RELAYCOUNTRY_US,RP_MATCHES_RCVD,T_REMOTE_IMAGE,URIBL_GREY Why does DKIM_VERIFIED have a zero score in 50_scores.cf? Anybody, including spammers, can do DKIM. You could make have it a small negative score like -0.5 or so. I've added the following entries to local.cf, but I suspect this is what I'm doing wrong. I don't mean to whitelist all of constant contact. whitelist_from_dkim *@in.constantcontact.com whitelist_from_dkim *@bertolini-sales.com There is a copy of the full message here: http://pastebin.com/raw.php?i=pmyFn9f9 Thanks so much for any ideas. Alex I think you want whitelist_from_dkim *@bertolini-sales.com auth.ccsend.com The auth.ccsend.com comes from the signature line DKIM-Signature: ... d=auth.ccsend.com -jeff
Disposition deleted
Can somebody clue me in on how to match 'Disposition: automatic-action/MDN-sent-automatically; deleted' in a disposition-notification mime attachment? --_=_NextPart_001_01CC55E0.440F392C Content-Type: message/disposition-notification Content-Transfer-Encoding: 7bit Final-Recipient: RFC822; kathy.du...@ca.com Disposition: automatic-action/MDN-sent-automatically; deleted X-MSExch-Correlation-Key: 1CORJJTUYkSeBj5kXwFqLQ== --_=_NextPart_001_01CC55E0.440F392C-- I've tried body, rawbody and mimeheader without success: mimeheader LOCAL_AUTOMATIC_ACTION Disposition =~ /automatic-action\/MDN-sent-automatically; deleted/ This appears to be some new MS Exchange bounce message. I'm running 3.2.5 if it matters. thanks. -jeff
RE: SA and Spear Phishing
From: Hamad Ali Date: Sat, 19 Mar 2011 00:46:08 +0400 ## back on topic ## Anyway, I would highly appreciate any help on spear phishing. A solution, a guess, or just if you know whether you get spear phish at all is good information for me (I started to think that 99% of mail admins never know that they get spear phish because of the extremely high success rate of spear phish). PS: Spear Phishing is a problem that I noticed many commercial appliances struggle at. This thread is not meant to promote or demote SA, but to address a cutting-edge problem that many software classifiers fail to address. --H Either I haven't gotten any spear phishing spam, or the spear phishing spam is being blocked by SpamAssassin. I'll assume the later. If there's some particular type of email that you're having trouble with the easiest way to get help is to post a complete sample including all the headers using some pastebin and send the link and the x-spam-status line that you get on your SpamAssassin to the group. Otherwise all you're going to get vague platitudes like train bayes. -jeff
Re: new rules - where do i activate them?
From: John Hardin Date: Wed, 2 Mar 2011 07:50:38 -0800 (PST) On Wed, 2 Mar 2011, tr_ust wrote: > This is what my rules look like now: > > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/index\/form1.html/ > score LOCAL_URI_EXAMPLE 200 > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/nana\/form1.html/ > score LOCAL_URI_EXAMPLE 100 > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/ontokoros\/form1.html/ > score LOCAL_URI_EXAMPLE 100 > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/tbt\/form1.html/ > score LOCAL_URI_EXAMPLE 200 > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/webadmin\/form1.html/ > score LOCAL_URI_EXAMPLE 200 > > I took out the last "/" as you suggested...thanks. You may also want to escape the periods so they are literal matches rather then "match any single character": uri LOCAL_URI_EXAMPLE /zynetsw\.com\/forms\/use\/webadmin\/form1\.html/ Also, you only have one rule there. Every time you put in another "uri LOCAL_URI_EXAMPLE" you overwrite the previous definition. Change the name of each rule, for example by appending _00 _01 _02, etc. Also, the rules could be combined into a single rule (untested) using regexp (?:index|nana|ontokoros|tbt|webadmin) uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/(?:index|nana|ontokoros|tbt|webadmin)\/form1.html/ -jeff
Re: Trouble whitelisting domain users with whitelist_from_rcvd
From: keithcommins Date: Wed, 28 Jul 2010 07:57:43 -0700 (PDT) Hi there , Having some trouble getting this to work correctly , it would seem.. Firstly, here is my whitelist_from rcvd config from my local.cf file. You can't use whitelist_from_rcvd on internal email. You don't have an external relay to match against. It doesn't matter if your machine ends in .local or not. Note the FH_DATE_PAST_20XX. You probably need to run sa-update sometime this year. The ALL_TRUSTED should be enough by itself. If you need to have a separate whitelisting you could try something like the following: meta __TRUSTED_NETWORKS (NO_RELAYS || ALL_TRUSTED) header __LOCAL_SENDER From =~ /\...@mydomain\.com/i meta FORGED_LOCAL_SENDER (__LOCAL_SENDER && !__TRUSTED_NETWORKS) score FORGED_LOCAL_SENDER 0.1 meta VALID_LOCAL_SENDER (__LOCAL_SENDER && __TRUSTED_NETWORKS) score VALID_LOCAL_SENDER -0.1 -jeff whitelist_from_rcvd *...@mydomain.com mydomain.local trusted_networks 172.16.1/24 172.16.2/24 172.16.3/24 172.16.5/24 xx.xx.xx.xx internal_networks 172.16.1/24 172.16.2/24 172.16.3/24 172.16.5/24 xx.xx.xx.xx ( xx.xx.xx.xx represents the outward facing IP of my mail server ) Secondly, below is a header from a test email I sent to myself.. Return-Path: Received: by mydomain.com (CommuniGate Pro PIPE 5.2.12) with PIPE id 18275900; Wed, 28 Jul 2010 11:31:13 +0100 X-TFF-CGPSA-Version: 1.5 X-TFF-CGPSA-Filter: Scanned X-Spam-DCC: wuwien: mail.mydomain.com 1290; Body=1 Fuz1=2 Fuz2=6 X-Spam-Checker-Version: SpamAssassin 3.2.5 ( 2008-06-10 ) on mail.mydomain.com X-Spam-Level: *** X-Spam-Status: No, score=3.8 required=8.0 tests=ALL_TRUSTED,FH_DATE_PAST_20XX, HTML_IMAGE_ONLY_20,HTML_MESSAGE autolearn=no version=3.2.5 X-Spam-Pyzor: Received: from [172.16.3.150] (account some.user [172.16.3.150] verified) by mydomain.com (CommuniGate Pro SMTP 5.2.12) with ESMTPA id 18275888 for some.u...@mydomain.com; Wed, 28 Jul 2010 11:31:04 +0100 Message-ID: <4c500626.7010...@mydomain.com> Date: Wed, 28 Jul 2010 11:27:50 +0100 From: Some User User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Some User Subject: (no subject) Content-Type: multipart/alternative; boundary="020906000403080006070205" X-EsetId: 90695D289D6435708F6F5D7C933375 This is a multi-part message in MIME format. --020906000403080006070205 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Couple of things to note , we use Active Directory which means the FQDN name of all our machines end in *.local rather than *.com. Should the whitelist_rcvd reflect this in any way?? Its my understanding that all mails should get a Spam Assassin score of -100 or thereabouts , thus permanently whitelisting all our domain users. However , as you can see this isn't happening?? Is there anything else I should be doing to whitelist my domain users?? Thanks in advance for all your help.. Keith -- View this message in context: http://old.nabble.com/Trouble-whitelisting-domain-users-with-whitelist_from_rcvd-tp29287372p29287372.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: flat file bayes locking issue and difference errors depending on file locking method
From: "R-Elists" Date: Wed, 14 Apr 2010 08:43:21 -0700 having spent the better part of a two days searching as well as trying different configs and SA restarts we do not have a "hardware horsepower" resource starvation issue in reference to the error spamd[30339]: bayes: cannot open bayes databases /home/spamd/.spamassassin/bayes_* R/W: lock failed: Interrupted system call I'd guess that you have a bayes expire running that is either taking too long or not finishing and leaving lock files around. Turn off bayes_auto_expire and use bayes_learn_to_journal. Add a cron job to periodically sa-learn --sync (say hourly) and another cron job to do sa-learn --force-expire (daily/weekly) -jeff
Re: Limit SA to scan messages 100k and below
From: Keith De Souza Date: Wed, 31 Mar 2010 14:10:50 +0100 Hi *>> You need to change whatever glue you are using to pass messages to SA, >>and skip the scanning for messages larger than your desired threshold. *Sorry as I'm new to SA can you elaborated what you mean by glue? * >>That said, IMHO 100k is rather low. Why do you want that particular >>threshold?* Judging from your response, I may be wrong in what I need to do: Basically I'm having a few errors in my Exim logs from legitamate senders not coming through: 300 seconds looks like an timeout. Something is giving up after waiting 300 seconds. Note the autolearn=unavailable. I'd guess that you are getting locked out from the Bayes database. You probably had a Bayes expire running at the same time. There should be messages about this in a log file. If this is the case you can turn off bayes_auto_expire and run expire from cron. You could also try learning to the journal and doing sa-learn --sync periodically from cron. -jeff === 2010-03-31 01:22:25 1Nwlbc-0001QS-Ua H= host81-136-197-86.in-addr.btopenworld.com (mail.duke.tv) [81.136.197.86] F=< l...@dukeandearl.com> temporarily rejected after DATA === And after checking my SA logs: === Mar 31 01:25:51 mailserver spamd[5379]: spamd: result: . -4 - GENESIS_PHONENUMBER07 *scantime=300.0,size=24337*, user=nobody,uid=8,required_score=3.2,rhost=localhost,raddr=127.0.0.1,rport=42308,mid=< c7d27527.8a78%l...@dukeandearl.com >,autolearn=unavailable == I'm trying to understand why is it taking 300.0 seconds to scan a message only 24Kb in size?? I'm begeining to think that because SA is taking so long to scan the message, it is timing out and hence Exim returning a "temporarily reject after DATA". My thoughs so far is to perhaps reducing the file size that SA takes to scan and see if the scan time reduces. I may be wrong in my troublshooting methods but I'm not sure why this is happeninig at present. Many Thanks 2010/3/31 Karsten Bräckelmann > On Wed, 2010-03-31 at 13:24 +0100, Keith De Souza wrote: > > My current sysadmin has now left the company and I'm new to SA and > > Exim. [...] > > > I've read somewhere that the default setting for SA to scan a message > > is 500k. > > That's actually the default for spamc. Messages exceeding the threshold > just won't be passed to spamd. SA (and spamd) will check everything it > gets passed. > > > Can I reduce this, so that SA scans messages 100k and below? > > You need to change whatever glue you are using to pass messages to SA, > and skip the scanning for messages larger than your desired threshold. > > That said, IMHO 100k is rather low. Why do you want that particular > threshold? > > guenther > > > -- > char *t="\10pse\0r\0dtu...@ghno > \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i c<<=1: > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; > }}} > >
Re: Off Topic - SPF - What a Disaster
From: Martin Gregorie Date: Tue, 23 Feb 2010 22:04:07 + On Tue, 2010-02-23 at 16:17 -0500, Bowie Bailey wrote: > The only exception is if you have a strict SPF policy for your own > domain, you can use it to reject spam pretending to be from your users. Agreed. That's all I use it for. The SPF checks in SpamAssassin will score SPF_FAIL without adding enough points to block the email by itself. I'm not ready to outright block email that fail SPF. I installed SPF during a backscatter storm, which immediately decreased in volume. Since then the periodic backscatter showers have got steadily smaller, so it looks as though mailservers configured check SPF before bouncing undeliverable mail have been getting steadily more common. Either that or spammers tend to avoid forging domains that have SPF. -jeff
Re: X-Relay-Countries can stick?
From: Robert Nicholson Date: Fri, 12 Feb 2010 19:32:00 -0600 Perhaps my confusion lies in the fact that it looks like headers != metadata? Is there a way or setting that allows metadata to result in headers in the message? Did you try add_header? ifplugin Mail::SpamAssassin::Plugin::RelayCountry add_header all Relay-Country _RELAYCOUNTRY_ endif
Re: MTX plugin created (Re: Spam filtering similar to SPF, less breakage)
From: Charles Gregory Date: Thu, 11 Feb 2010 11:55:10 -0500 (EST) On Wed, 10 Feb 2010, dar...@chaosreigns.com wrote: > http://www.chaosreigns.com/mtx/ You know, just for a moment I thought I would take a look, just for curiosity sake, and instead got this moronic jack-ass ATTITUDE page. Heh. Using IE 7.0 I get: Your browser cannot handle the 9 year old standard required by the web page you attempted to access. ... IE 7.0 displays the page fine, but you have to save the file out as a plain html file. -jeff
Re: Rules for not passing SPF
From: dar...@chaosreigns.com Date: Tue, 2 Feb 2010 18:38:20 -0500 On 02/02, Marc Perkel wrote: > Why would you want to catch domains without SPF as SPF has no > relationship to detecting spam? SPF is entirely about spam. Actually, SPF is about forgery and forgery is part of the spam problem. You can still have genuine spam that passes SPF. Messages that get SPF_FAIL are forged spam and can be scored or blocked. http://www.openspf.org/Introduction If everyone uses SPF, all we need to block all spam is these rules (SPF_NOT_PASS alone should do it), and a blacklist of domains that have SPF records including IPs that send spam. Good luck. All you need is to get everybody to use SPF and then have a very large blacklist of spam sending domains. http://www.rhyolite.com/anti-spam/you-might-be.html SPF is easy, there's a wizard http://www.openspf.org/, then you paste the results into the DNS TXT record for your domain). SPF is great for what it does. -jeff
Re: How should this tricky spam be filtered?
From: KÄrlis Repsons Date: Sat, 30 Jan 2010 17:20:23 + On Saturday 30 January 2010 15:48:36 Jeff Mincy wrote: > BAYES_99,DCC_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_FIVETEN_SPAM,RCVD_IN_NIX > SPAM,RCVD_IN_UCEPROTECT1,RCVD_IN_UCEPROTECT2,RCVD_IN_UCEPROTECT3,BOTNET,BOT > NET_BADDNS > > Botnet/FIVETEN/NIXSPAM/UCEPROTECT are additional rules added. > -jeff Thanks, just about DCC: why its said to be "not opensource" and commented out in a spamassassin default config? Are there any closed-source binaries on a client machine from it? Any such binaries related to SA exist? DCC is a separately managed project with its own license. DCC has to be installed and configured (dccproc and dccifd) outside of SpamAssassin. After DCC is installed then SpamAssassin has to be configured to use DCC by loading the plugin. You can install DCC from source or from various repositories. Same is true for razor and pyzor. -jeff
Re: How should this tricky spam be filtered?
From: Ralph Bornefeld-Ettmann Date: Sat, 30 Jan 2010 18:14:10 +0100 Am 30.01.2010 16:48, schrieb Jeff Mincy: >From: KÄrlis Repsons >Date: Sat, 30 Jan 2010 14:07:16 + > >On Saturday 30 January 2010 13:54:14 Jeff Mincy wrote: >> Retrain the message correctly in Bayes. Bayes will catch on to this >> after a few times. The subject alone should be a strong enough clue >> for bayes (I get BAYES_80 on this partial sample), so it looks like >> you are doing only autolearn and not correcting messages that were >> learned incorrectly. >> -jeff > > I couldn't figure out how to get an unadulterated version of the > message from the spamalyser.com link you posted in a previous message. > I tried this > wget -O - -q http://spamalyser.com/v/5cbffujq/original.txt > pastebin has a simple way to download the original. > Anyway, I eventually got something. in the "Raw Message" tab you can get the plain message (http://spamalyser.com/v/5cbffujq/raw) Sorry. Looks more like html here. % wget -O - -q http://spamalyser.com/v/5cbffujq/raw | head http://www.w3.org/TR/html4/strict.dtd";> To get the raw email message, I'd have to write something like wget -O - -q http://spamalyser.com/v/5cbffujq/raw | w3m -dump -T text/html followed by sed scripts to keep the lines with line numbers discard the line numbers. I guess http://spamalyser.com is looking at the User-Agent: Wget/1.10.2 header. Maybe there could be a really-raw-without-line-numbers-and-no-html target. -jeff
Re: How should this tricky spam be filtered?
From: KÄrlis Repsons Date: Sat, 30 Jan 2010 14:07:16 + On Saturday 30 January 2010 13:54:14 Jeff Mincy wrote: > Retrain the message correctly in Bayes. Bayes will catch on to this > after a few times. The subject alone should be a strong enough clue > for bayes (I get BAYES_80 on this partial sample), so it looks like > you are doing only autolearn and not correcting messages that were > learned incorrectly. > -jeff I couldn't figure out how to get an unadulterated version of the message from the spamalyser.com link you posted in a previous message. I tried this wget -O - -q http://spamalyser.com/v/5cbffujq/original.txt pastebin has a simple way to download the original. Anyway, I eventually got something. Hmm, well, I just started with SA, so my filters aren't much trained yet. The thing is, I didn't believe its the Bayes filter to be used for that case! Bayes is an incredible tool, but only if you let it. The worst thing you can do to bayes is mistrain it by learning spam messages has ham. The other bad thing is to limit the number of messages that it learns from. Because I still think, that its not correct to train SA filter on that letter as spam! It can contain words, which simply should not contribute to be more "spam", no? Thats not a problem? No, that is not a problem. Yes, spam contains words, some of those words will also occur in ham. Bayes will figure out which words are spammy and which are hammy and which occur in both. First start with training Bayes and then check if DCC and network tests are enabled. Anyway, I get the following. BAYES_99,DCC_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_FIVETEN_SPAM,RCVD_IN_NIXSPAM,RCVD_IN_UCEPROTECT1,RCVD_IN_UCEPROTECT2,RCVD_IN_UCEPROTECT3,BOTNET,BOTNET_BADDNS Botnet/FIVETEN/NIXSPAM/UCEPROTECT are additional rules added. -jeff
Re: How should this tricky spam be filtered?
From: KÄrlis Repsons Date: Sat, 30 Jan 2010 13:35:26 + People, perhaps its simple to be done, but I personally would like to know the ways to get rid of something like this: Use pastebin and save the entire message including the headers instead of forwarding messages like this. -- Forwarded Message -- ... --- Obviously, the only useful part of all that was the From: name field. SA gives just "X-Spam-Status: No, score=-0.7 required=4.0 tests=BAYES_20 autolearn=ham version=3.2.5-gr2". Hopefully a valid question here... Retrain the message correctly in Bayes. Bayes will catch on to this after a few times. The subject alone should be a strong enough clue for bayes (I get BAYES_80 on this partial sample), so it looks like you are doing only autolearn and not correcting messages that were learned incorrectly. -jeff
Re: About upgrading
From: Alex Date: Sat, 9 Jan 2010 21:13:24 -0500 > sa-learn --dump magic gives: > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 57538 0 non-token data: nspam > 0.000 0 74876 0 non-token data: nham > 0.000 0 166338 0 non-token data: ntokens > 0.000 0 1257478501 0 non-token data: oldest atime > 0.000 0 1263049426 0 non-token data: newest atime > 0.000 0 1263049538 0 non-token data: last journal sync atime > 0.000 0 1263044805 0 non-token data: last expiry atime > 0.000 0 5529600 0 non-token data: last expire atime delta > 0.000 0 1868 0 non-token data: last expire reduction count > > Your database has 166338 tokens which is larger than the default > bayes_expiry_max_db_size 15. The last expiration ran this morning > at 8:46. You could try letting the bayes database get larger and turn > off bayes_auto_expire. If you turn off bayes_auto_expire you'll have > to add something to cron to periodically expire tokens. > bayes_auto_expire is fine for lower volumes of email, but can get in > the way with higher volumes. Also, what is the drawback with using auto_expire on larger volumes? Is it the locking delay and preventing learning new messages during that time? If you were to put it in cron to manually do an expiry, how often should it be run? You have an exclusive lock when doing expiration. Expiration presumably takes longer on larger volumes, but it is still pretty fast. Running expiration daily or weekly should be more than sufficient. Is there anything that should be tested prior to making this change, or is it pretty benign? Yes - turning off bayes_auto_expire is pretty benign. You may not need to make this type of change. The default options for bayes work fine for lower email volumes. I suppose you could take the ntokens value before, and subtract it from the after value to see how many tokens were expired, right? It would be interesting to see how many tokens are expired on a regular basis, but not sure that's very useful, just interesting. sa-learn tells how many tokens were deleted you when you do --force-expire, for example: expired old bayes database entries in 152 seconds 1516428 entries kept, 115692 deleted token frequency: 1-occurrence tokens: 73.76% token frequency: less than 8 occurrences: 16.19% -jeff
Re: About upgrading
From: Cecil Westerhof Date: Sat, 09 Jan 2010 16:24:56 +0100 Jeff Mincy writes: >I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes >more time with 3.2.5 as it took with 3.0.4. Can this be true? > >It is not a problem, because it is done by cron-tab, but I am just >curious. > > You can use spamc -L spam/ham to learn messages. Spamc -L is faster > than sa-learn. The spamd daemon needs to be started with > --allow-tell. That is not really an answer on my question. ;-) I doubt that bayes learning has slowed down significantly. I would expect that choice of bayes_store_module, learning to journal, whether auto expiration runs, and lock contention matters more than the version. But it does not seem to be interesting in my situation. First my code has to grow from: sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/ to: for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do spamc -L ${typeStr} <${i} done Which is not even enough, because I need to take care of the situation that the directory is empty and I need to implement code to show the messages delivered by sa-learn. Oh. You're learning all of the messages in a directory. spamc -L is faster than sa-learn for learning single messages because sa-learn is a perl script that has to load Mail::SpamAssassin each time. For a large directory the slower startup of sa-learn is less of an issue. sa-learn is fine for doing directories. Which a low level of spam it work, but if it becomes bigger, it does not work: date echo ${echoStr} sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/ date for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do spamc -L ${typeStr} <${i} done echo learned in the new way date gives: za jan 9 16:09:25 CET 2010 Increase Learned tokens from 0 message(s) (45 message(s) examined) za jan 9 16:09:40 CET 2010 learned in the new way za jan 9 16:10:00 CET 2010 So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more code. Beside taking care of an empty directory, I also need to implement the feedback given by sa-learn.) You learned tokens from 0 messages and looked at 45 messages. You've already previously learned from those 45 messages, which is just timing how fast it can do nothing. > You can try using bayes_learn_to_journal - and do a separate sa-learn > --sync job in cron. Learning to the journal is faster. I'll look into that. > Also, What is the size of your database? Maybe you are spending lots > of time doing expires or something. sa-learn --dump magic gives: 0.000 0 3 0 non-token data: bayes db version 0.000 0 57538 0 non-token data: nspam 0.000 0 74876 0 non-token data: nham 0.000 0 166338 0 non-token data: ntokens 0.000 0 1257478501 0 non-token data: oldest atime 0.000 0 1263049426 0 non-token data: newest atime 0.000 0 1263049538 0 non-token data: last journal sync atime 0.000 0 1263044805 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 1868 0 non-token data: last expire reduction count Your database has 166338 tokens which is larger than the default bayes_expiry_max_db_size 15. The last expiration ran this morning at 8:46. You could try letting the bayes database get larger and turn off bayes_auto_expire. If you turn off bayes_auto_expire you'll have to add something to cron to periodically expire tokens. bayes_auto_expire is fine for lower volumes of email, but can get in the way with higher volumes. -jeff
Re: About upgrading
From: Cecil Westerhof Date: Sat, 09 Jan 2010 14:39:59 +0100 Cecil Westerhof writes: > I did the upgrade. It took some time and there was a slight problem with > permissions, but it looks like a successful upgrade. I only changed > /dev/null to a real mailbox, because of the 2010 problem. When something > like this happens again I now can recover those e-mails. I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes more time with 3.2.5 as it took with 3.0.4. Can this be true? It is not a problem, because it is done by cron-tab, but I am just curious. You can use spamc -L spam/ham to learn messages. Spamc -L is faster than sa-learn. The spamd daemon needs to be started with --allow-tell. You can try using bayes_learn_to_journal - and do a separate sa-learn --sync job in cron. Learning to the journal is faster. Also, What is the size of your database? Maybe you are spending lots of time doing expires or something. -jeff
RE: [sa] Re: FH_DATE_PAST_20XX
From: "R-Elists" Date: Sat, 2 Jan 2010 08:33:42 -0800 > > > /20[1-9][0-9]/ --> /20[2-9][0-9]/ > we changed it to this before the update and still had the issue. so we changed back to the older version and then zero'd the score. waitied for the update after the update, changed the score to a small positive value to re-enable yet the rule is still *hitting* for some reason... since it is a header rule, what should i start looking at to see where the issue is coming from? somewhere in SA? should i enable special logging? or, should i check the MTA and it's assigns that deal with the header? The rule is probably also defined in some other file. Are you using 00_FVGT_File001.cf? If so check there. -jeff
RE: [sa] Re: FH_DATE_PAST_20XX
From: "R-Elists" Date: Fri, 1 Jan 2010 15:48:13 -0800 > Cc: Spamassassin users list > Subject: Re: [sa] Re: FH_DATE_PAST_20XX > > Damn -- mea culpa. When we fixed the bug in SVN trunk in bug > 5852, I should have immediately backported it to the 3.2.x > sa-update channel when I commited that patch, but I didn't. > > It's now fixed in updates, but that won't help the admins > who've been paged to deal with high FP rates on a holiday. > :( Sorry folks... > > --j. what should the new rule look like? i mean, i get it, and i think i know, and i even tested it and it was still failing even after a restarts... s... seriously, i disabled the rule early AM yet when the update came through 4 or so hours later, i believe it looks exactly the same as when i first viewed it early on... The easiest way to see what is being changed since your last sa-update is to first sa-update /tmp and diff. The change is trivial but significant... root% sa-update -D --updatedir /tmp/updates root% diff -r -U 0 /var/lib/spamassassin/3.002005/updates_spamassassin_org /tmp/updates/updates_spamassassin_org diff -u -w --minimal -r -U 0 /var/lib/spamassassin/3.002005/updates_spamassassin_org/72_active.cf /tmp/updates/updates_spamassassin_org/72_active.cf --- /var/lib/spamassassin/3.002005/updates_spamassassin_org/72_active.cf 2009-07-20 17:01:55.0 -0400 +++ /tmp/updates/updates_spamassassin_org/72_active.cf 2010-01-01 18:51:10.0 -0500 @@ -527,7 +527,7 @@ ##{ FH_DATE_PAST_20XX -header FH_DATE_PAST_20XX Date =~ /20[1-9][0-9]/ [if-unset: 2006] +header FH_DATE_PAST_20XX Date =~ /20[2-9][0-9]/ [if-unset: 2006] describe FH_DATE_PAST_20XX The date is grossly in the future. ##} FH_DATE_PAST_20XX -jeff
Re: dkim whitelisting
From: LuKreme Date: Wed, 16 Dec 2009 08:23:23 -0700 I'm adding address book users into the user_prefs files, but without the signing domain this is useless and emails for my users are still getting tagged up as spam (these in particular score 7-10 points without the whitelist). Is there a better way, or do I just have to go in and find a DKIM-Signature for each address book entry and then parse out the d= field? Yes, you need the d= part. Note You should only do this for messages from domains that are signed and pass DKIM with DKIM_VERIFIED. Adding whitelist_From_dkim won't do any good if you don't have DKIM_SIGNED and DKIM_VERIFIED. grep -r "^DKIM-Signature:" $HOME/Maildir | awk '{print $4}' | sed 's/d=//' | sed 's/;//' | sort -u I dunno, doesn't seem that efficient (oh, and it doesn't work since the d= doesn't appear in the same location in all the headers). If you are going to use sed, You need the entire DKIM_Signature header as one line. Use formail to extract the header, for example formail -c -x DKIM-Signature: NAME formail - mail (re)formatter ... -c Concatenate continued fields in the header. Might be convenient when postprocessing mail with standard (line oriented) text utili- ties. -jeff
Re: HABEAS_ACCREDITED SPAMMER
From: LuKreme Date: Mon, 23 Nov 2009 17:08:11 -0700 On Nov 23, 2009, at 7:39, Matus UHLAR - fantomas wrote: > Yes, why to differ between non-abusing and abusing marketers... We've been through this before. On my mail, habeas is a very strong indicator of spam. It does not appear in legitimate mail. I find it a little hard to believe that your spam is so much different from my spam. On my mail, not one single spam message (out of 228k total) hit HABEAS for all of 2009. The few messages (480 out of 11k) that hit HABEAS were all ham, either professional organizations/newsletters, transactions from places like Vanguard or retail stores that I have a relationship with. I don't know who these legitimate marketers are, but I don't feel I'm missing anything. You WILL 'block' legitimate mail. However, It's your email, so you can do anything you want. If you think HABEAS is so bad just set the HABEAS scores to zero and save the network bandwidth. -jeff
Re: Timeouts: pyzor and razor2
From: Art Greenberg Date: Mon, 9 Nov 2009 17:58:48 -0500 (EST) Lately I'm seeing a fairly consistent timeout for checks sent to pyzor and razor2 by SA. Up until a couple of days ago this was a very rare concurrence. Seems odd that both of these would have this trouble at the same time. Has anyone else noticed this? Perhaps I changed something here that is causing it Pyzor is currently timing out: % /usr/bin/pyzor ping public.pyzor.org:24441 TimeoutError: Razor is fine You can increase the timeout if razor is running slow: ifplugin Mail::SpamAssassin::Plugin::Razor2 # How many seconds you wait for razor to complete before you go on without the results razor_timeout 15 endif -jeff
Re: Another dcc question
From: Rick Knight Date: Tue, 13 Oct 2009 09:42:18 -0700 Jeff Mincy wrote: >From: Rick Knight >Date: Tue, 13 Oct 2009 08:53:21 -0700 > >Just following this thread because I recently got dcc working also. In >my case I didn't have dcc installed. After installing dcc everything >seems to be working but now I'm wondering about dccifd. On my system >dccproc is in /usr/local/bin but dccifd is in /var/dcc/libexec/. I also >have start-dccifd in /var/dcc/libexec. I assume I need to add >dcc_dccifd_path to my local.cf and then run start-dccifd before starting >spamassassin. Is that correct? > > Run spamassassin --test-mode. If spamassassin finds dccifd it will > say 'dccifd is available': > > % spamassassin --test-mode --debug dcc < MESSAGE 2>&1 | fgrep dccifd > 134:[14145] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd > 135:[14145] dbg: dcc: dccifd got response: X-DCC-sonic.net-Metrics: pinky 1156; bulk Body=1 Fuz1=many Fuz2=many > > If you get 'dccifd is not available: > ... dbg: dcc: dccifd is not available: no r/w dccifd socket found > > then you need to use dcc_dccifd_path or dcc_home > -jeff > Thanks Jeff, When I run test-mode I just get this bash: MESSAGE: No such file or or directory I'm sure I'm just useing the command wrong. create a file called MESSAGE that contains a complete spam message with full headers.
Re: Another dcc question
From: Rick Knight Date: Tue, 13 Oct 2009 08:53:21 -0700 Just following this thread because I recently got dcc working also. In my case I didn't have dcc installed. After installing dcc everything seems to be working but now I'm wondering about dccifd. On my system dccproc is in /usr/local/bin but dccifd is in /var/dcc/libexec/. I also have start-dccifd in /var/dcc/libexec. I assume I need to add dcc_dccifd_path to my local.cf and then run start-dccifd before starting spamassassin. Is that correct? Run spamassassin --test-mode. If spamassassin finds dccifd it will say 'dccifd is available': % spamassassin --test-mode --debug dcc < MESSAGE 2>&1 | fgrep dccifd 134:[14145] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd 135:[14145] dbg: dcc: dccifd got response: X-DCC-sonic.net-Metrics: pinky 1156; bulk Body=1 Fuz1=many Fuz2=many If you get 'dccifd is not available: ... dbg: dcc: dccifd is not available: no r/w dccifd socket found then you need to use dcc_dccifd_path or dcc_home -jeff
Re: just enabled DCC
From: Dan Schaefer Date: Tue, 13 Oct 2009 10:17:43 -0400 Jeff Mincy wrote: >From: Dan Schaefer >Date: Tue, 13 Oct 2009 09:18:44 -0400 > > Jeff Mincy wrote: >>From: Dan Schaefer >>Date: Tue, 13 Oct 2009 08:54:29 -0400 >> >>Jason Bertoch wrote: >>> Dan Schaefer wrote: >>>> I just enabled DCC yesterday and everything appears to be working >>>> (DCC is registered). Just to make sure, can someone post an email to >>>> pastebin that has a DCC hit? Thanks. >>>> >>> IIRC, a message with "test" in the subject and body will match, >>> although your logs should tell you what rules are hitting anyway. >> >>Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs >>yesterday. "test" in the subject and "test" in the body only triggered >>TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work >>address. Any other suggestions? >> >> Use >>spamassassin --test-mode --debug dcc < somespammsg >> >> Should print out stuff like: >> >>08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, registering DCC >>08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd >>08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many >>08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 FUZ1=4384/20 FUZ2=99/20 >> >> >> -jeff >> >I followed your instructions and received the following: > >[1486] dbg: dcc: network tests on, registering DCC >[1486] dbg: dcc: dccifd is not available: no r/w dccifd socket found >[1486] dbg: dcc: dccproc is not available: no dccproc executable found >[1486] dbg: dcc: dccifd and dccproc are not available, disabling DCC > >After seeing that, I NAT-ed 1023 local to 6277 remote and 6277 remote to >1023 to my mail server in my firewall. I ran the test again and received >the same message. > > Your firewall is not the problem shown here. SpamAssassin can't find > the dcc socket and executable. Do you have DCC installed? If so, > where is the dccproc executable? Did you start dccifd? Where is the > dccifd socket? SpamAssassin needs to know where they are. You can > use various configuration options to tell SpamAssassin where to look, > for example: > ## DCC options (Admin only) > dcc_home /var/lib/dcc > dcc_dccifd_path /var/lib/dcc/dccifd > dcc_path /usr/bin/dccproc > > -jeff > I did just install DCC, but I don't know if it is installed correctly. And of course, DCC's website is down (http://www.rhyolite.com/anti-spam/dcc/). I used the instructions here instead: http://www.freespamfilter.org/FC4.html#_Toc110999211 Now when I run: spamassassin -t -D dcc < spam_message I get: [2955] dbg: dcc: network tests on, registering DCC [2955] dbg: dcc: dccifd is not available: no r/w dccifd socket found [2955] dbg: dcc: dccproc is available: /usr/bin/dccproc [2955] dbg: dcc: opening pipe: /usr/bin/dccproc -H -x 0 -a 74.86.146.6 < /tmp/.spamassassin2955q6p1Yatmp [2955] dbg: dcc: got response: X-DCC-SIHOPE-DCC-3-Metrics: pony.performanceadmin.com 1085; Body=2 Fuz1=2 Fuz2=many and 2.2 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/) in the report Even though the dccfid socket cannot be found, does this appear to be working correctly? Yes dccproc is working. You got a hit on DCC_CHECK. You should use dccifd if possible. It is faster. -jeff
Re: just enabled DCC
From: Dan Schaefer Date: Tue, 13 Oct 2009 09:18:44 -0400 Jeff Mincy wrote: >From: Dan Schaefer >Date: Tue, 13 Oct 2009 08:54:29 -0400 > >Jason Bertoch wrote: >> Dan Schaefer wrote: >>> I just enabled DCC yesterday and everything appears to be working >>> (DCC is registered). Just to make sure, can someone post an email to >>> pastebin that has a DCC hit? Thanks. >>> >> IIRC, a message with "test" in the subject and body will match, >> although your logs should tell you what rules are hitting anyway. > >Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs >yesterday. "test" in the subject and "test" in the body only triggered >TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work >address. Any other suggestions? > > Use >spamassassin --test-mode --debug dcc < somespammsg > > Should print out stuff like: > >08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, registering DCC >08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd >08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many >08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 FUZ1=4384/20 FUZ2=99/20 > > > -jeff > I followed your instructions and received the following: [1486] dbg: dcc: network tests on, registering DCC [1486] dbg: dcc: dccifd is not available: no r/w dccifd socket found [1486] dbg: dcc: dccproc is not available: no dccproc executable found [1486] dbg: dcc: dccifd and dccproc are not available, disabling DCC After seeing that, I NAT-ed 1023 local to 6277 remote and 6277 remote to 1023 to my mail server in my firewall. I ran the test again and received the same message. Your firewall is not the problem shown here. SpamAssassin can't find the dcc socket and executable. Do you have DCC installed? If so, where is the dccproc executable? Did you start dccifd? Where is the dccifd socket? SpamAssassin needs to know where they are. You can use various configuration options to tell SpamAssassin where to look, for example: ## DCC options (Admin only) dcc_home /var/lib/dcc dcc_dccifd_path /var/lib/dcc/dccifd dcc_path /usr/bin/dccproc -jeff
Re: just enabled DCC
From: Dan Schaefer Date: Tue, 13 Oct 2009 08:54:29 -0400 Jason Bertoch wrote: > Dan Schaefer wrote: >> I just enabled DCC yesterday and everything appears to be working >> (DCC is registered). Just to make sure, can someone post an email to >> pastebin that has a DCC hit? Thanks. >> > IIRC, a message with "test" in the subject and body will match, > although your logs should tell you what rules are hitting anyway. Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs yesterday. "test" in the subject and "test" in the body only triggered TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work address. Any other suggestions? Use spamassassin --test-mode --debug dcc < somespammsg Should print out stuff like: 08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, registering DCC 08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd 08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many 08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 FUZ1=4384/20 FUZ2=99/20 -jeff
Re: Incresing numbers of DCC_CHECK in ham
From: "Jari Fredriksson" Date: Fri, 9 Oct 2009 20:44:09 +0300 > DCC identifies mail that has been sent often. That's what > the rule checks for, if other recipients have seen it, > too. > > You voluntarily installed DCC, knowing SA will use it. > This was on your discretion, and it's your duty to > evaluate if it actually is, what you want. > > [1] Once, mind you. Which is what DCC does, counting. The >"report spam" option in SA reports it differently as > many. 1. So what is DCC good for? DCC is extremely good at detecting bulk messages. All or nearly all spam messages are bulk. 2. Why does SpamAssassin use it? DCC is a separately configured plugin that does not run unless configured to do so at each SpamAssassin site. 3. Should I uninstall DCC if I want to get bulk but not Spam? You should whitelist legitimate bulk email in the DCC whiteclnt file. Or you could bypass SpamAssassin for mailing lists. You could lower the DCC_CHECK score. Or you could disable or uninstall DCC. 4. Question 2. again. SpamAssassin is about Spam, but I really need to receive bulk, as in mailing lists and newspaper posts. Are there people do not want any mail but what their friends send them, and that is the purpose of DCC? If you use DCC you have to whitelist legitimate sources of bulk email. 5. What special does the "Report to DCC" SpamAssassin function do for our good? Using "Report to DCC" reports the message to DCC with a count of many. After that everybody else querying the same message will get a count of many. -jeff
Re: Incresing numbers of DCC_CHECK in ham
From: "Jari Fredriksson" Date: Fri, 9 Oct 2009 19:25:15 +0300 > Is someone trying to poison DCC? > > Yes, you are(:-) If you haven't whitelisted the > mailing list then > you are reporting the email from the mailing list to DCC, > which will > increase the DCC count. Me? But I do report to DCC/Razor2/SpamCop only spam. I do not report ALL my email. Using spamassassin --report reports the spam message to dcc with a -t target count of many How does DCC actually work? Is any query a report somehow for DCC? If you ask the DCC network you are reporting it. >From the dccproc man page. -Q only queries the DCC server about the checksums of messages instead of reporting and then querying. This is useful when dccproc is used to filter mail that has already been reported to a DCC server by another DCC client such as dccm(8). This can also be useful when applying a private white or black list to mail that has already been reported to a DCC server. No single mail message should be reported to a DCC server more than once per recipient, such as would happen if dccproc is not given -Q when processing a stream of mail that has already been seen by a DCC client. Additional reports of a message increase its apparent "bulkness." -jeff
Re: Incresing numbers of DCC_CHECK in ham
From: "Jari Fredriksson" Date: Fri, 9 Oct 2009 17:58:06 +0300 This looks worrying. I have it at 2.2 pts, and not caused any false positives, but still, odd. Or is it? I know it is a SPAM indicator but a bulk indicator. Auto correct: That should be 'I know it is *not* a spam indicator but a bulk indicator.' Yes - it indicates bulk. Lots of people have seen the email message. DCC will hit spam, mailing lists, and retail email such as amazon, and various extremely short email messages. But it is triggered for example by some mailing list posts which are genuine and not bulk. What is a genuine mailing list post that is not bulk? If lots of people are on the mailing list then the message is, by definition, bulk. Is someone trying to poison DCC? Yes, you are(:-) If you haven't whitelisted the mailing list then you are reporting the email from the mailing list to DCC, which will increase the DCC count. Eventually somebody will report the mailing list as spam to DCC and you will get a DCC match on the default many=99. You have to whitelist the mailing list in the dcc whiteclnt file. -jeff
Re: Problems with whitelist_from_rcvd
From: Igor Bogomazov Date: Fri, 2 Oct 2009 12:34:55 +0400 When I add the string like: whitelist_from s...@domain.mail it works OK. But: whitelist_from_rcvd s...@domain.mail prefix.domain.mail doesn't work. I've checked rDNS of the prefix.domain.mail with 'host' utility - it's all right. And the appropriate mail header seems to be correct: Received: from prefix.domain.mail (unknown [12.12.12.12]) What's the matter? It is hard to say for sure without seeing actual received headers. You need to use the last external relay used by the email. >From man Mail::SpamAssassin::Conf. whitelist_from_rcvd ... This string is matched against the reverse DNS lookup used during the handover from the internet to your internal network's mail exchangers. It can either be the full hostname, or the domain component of that hostname. ... The easiest way to figure out which one to use is to add a Relay header using: add_header all Relay trusted=_RELAYSTRUSTED_, untrusted=_RELAYSUNTRUSTED_ Then get the RDNS from the first untrusted=[ip=... rdns=RDNS ...] relay. If the RDNS is blank then the whitelist_from_rcvd won't work. Your internal_networks and trusted_networks needs to be setup correctly. -jeff
Re: Re-running SA on an mbox
From: MySQL Student Date: Tue, 22 Sep 2009 15:38:47 -0400 > Try using a local SA setup for stripping the headers. By local, I mean > don't use your main production SA - run a separate copy with its own > (cut down) configuration and all data base accesses and UBL calls etc > turned off. Much better idea, thanks. Thanks for the script, too. Alex formail can be used to remove headers, for example: To remove all Received: fields from the header: formail -I Received: The following should do what you wanted to remove the X-Spam headers: formail -I X-Spam < msg -jeff
Re: Problem with whitelist_from_rcvd and forged reverse lookup
From: Sebastian Wiesinger Date: Thu, 30 Jul 2009 17:48:09 +0200 * John Hardin [2009-07-30 17:39]: >> Sendmail -> Procmail -> SA (spamc) > > Cool, that should be simple. > > Can you send: > > (1) the Received: headers from an email generated on that box, and > > (2) the procmail stanza where you call SA? I could create a procmail rule that excludes local mail from SA, but I would much rather like to whitelist this in spamassassin. Nevertheless thanks for your offer to help with procmail. Processing locally generated email that contain spam URLs through SpamAssassin is not a particularly good idea. If you have Bayes enabled then you are training your Bayes that spam URLs and whatever else is in the log files are hammy tokens. You really do want to skip SpamAssassin processing on messages like this in your procmail. -jeff
Re: Pyzor or DCC
From: Jonas Eckerman Date: Thu, 23 Jul 2009 15:37:11 +0200 Michael Hutchinson wrote: >> I saw a test >> message with just the word test in the subject hit DCC once. > That's really strange, I don't see how DCC would fire on the subject.. > the checksum of the message must have somehow matched some Spam.. That's perfectly normal. DCC doen't just match spam, it matches things that has been seen before. That means it matches bulk, but also anything that happens to be very common for other reasons. yep. I imagine that an empty message with the subject "test" is pretty common, so it's perfectly reasonable for DCC to have seen such messages many times before. I don't know if DCC cares about the subject att all. If it doesn't, it's even more liekey that it would hit on an empty test message. /Jonas DCC does hit on empty messages. The empty messages can be whitelisted. The DCC distribution includes a fetch-testmsg-whitelist script: % head /usr/src/dcc-1.3.111/misc/fetch-testmsg-whitelist #!/bin/sh # Fetch a list of "empty" mail messages for whitelisting. Many free mail # service providers add HTML or other text to mail. That causes empty # and nearly empty mail messages to have valid DCC checksums and not be # ignored by DCC clients. # The fetched file can be included in whiteclnt files. For example, the # following line in /var/dccwhiteclnt would whitelist many common # empty messages
Re: Pyzor or DCC
From: RW Date: Wed, 22 Jul 2009 03:45:50 +0100 On Wed, 22 Jul 2009 13:42:52 +1200 "Michael Hutchinson" wrote: > If you get an E-Mail scoring in both Pyzor and DCC, the chances are > very high that the message is Spam. We only deal with around 90,000 > incoming delivery attempts per day - but have not had a false > positive from Pyzor or DCC yet, and have been using both for some > years. That's odd, I get quite a lot of DCC FPs and a few Pyzor FPs on a relatively small amount of email. They tend to hit on bulk mail, like newsletters, automated mail and very generic mails. I saw a test message with just the word test in the subject hit DCC once. DCC identifies 'bulk' email. You have to whitelist desired bulk email senders in the DCC whiteclnt (etc) file. The DCC distribution includes sample scripts like edit-whiteclnt. Pyzor and Razor are easier to use because of the whitelisting. Razor and DCC are both highly effective (>80%), and Pyzor is good (>40%). -jeff
Re: Underscores
From: Matt Kettler Date: Thu, 16 Jul 2009 08:52:50 -0400 twofers wrote: > How can I pattern match when every word has an underscore after it. > Example: > This_sentenance_has_an_underscore_after_every_word > > I'm not really good at Perl pattern matching, but \w and \W see an > underscore as a word character, so I'm just not sure what might work. > > body =~ /^([a-z]+_+)+/i > > Is that something that will work effectively? Is this for a spam rule? I'd do something like this: body MY_UNDERSCORES/\S+_+\S+_+\S+/ Unless you really want to restrict it to A-Z. Regardless, ending any regex in + in a SA rule is redundant. Since + allows a one-instance match, it will devolve to that. You don't need to match the entire line with your rule, so the extra matches are redundant. It will match the first instance, and that's all it needs to be a match. Also any regex ending in * should just have it's last element removed, as that will devolve to a zero-count match. The /\S+_+\S+_+\S+/ rule will lots of technical email, for example discussions on shell environment variables like LD_LIBRARY_PATH. -jeff
Re: rbl/dnsbl seems to use wrong ip sometimes
From: dmy Date: Sat, 11 Jul 2009 14:27:34 -0700 (PDT) So is there a way to configure that ALL DNS tests just use the last external ip address (or at least NOT the first one?). Because to me it doesn't make any sense to test the ip people use to deliver messages to their smarthost and it produces quite a few false positives on my system... The smarthost presumably requires authenticated senders. The smarthost should then add a Received: header that shows that the sender was authenticated (eg ESMTPSA). If the smarthost is trusted then the sender will be trusted. Various tests are not run on trusted hosts. -jeff RW-15 wrote: > On Sat, 11 Jul 2009 12:52:56 -0700 (PDT) > dmy wrote: > >> As far as I understand SpamAssassin is supposed to just check the ip >> that directly delivered the email to my server but not the IP the >> email is originally from (as that woundn't make any sense as almost >> everyone is using dyn ips...). > > It depends on the test. Most of them run on all addresses outside the > trusted network, except for DUL tests and Spamhaus PBL + XBL which run > on the last external. -- View this message in context: http://www.nabble.com/rbl-dnsbl-seems-to-use-wrong-ip-sometimes-tp24443359p2012.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: USER_IN_WHITELIST Not Scoring
From: Karsten Bräckelmann Date: Fri, 10 Jul 2009 23:43:03 +0200 On Fri, 2009-07-10 at 06:53 -0700, an anonymous Nabble user wrote: > My local root user sends me nightly emails with mail/spam statistics and > information. Because of the spam information contained in the email, it > sometimes flagged as spam itself. > > In my local.cf, I have put the root user's email address in the > whitelist_from line, however whenever I send an email as the root user to my > legitimate email account, it is not getting scored. whitelist_from r...@myphonydomain.com Don't use the un-constrained whitelist_from, unless as a last resort, if there's no other way and you cannot use the proper constrained ones, like whitelist_from_rcvd. A local root sender should be getting ALL_TRUSTED. whitelist_from_rcvd won't work on local email - you need at least one external hop to get the 'rcvd' part. You could write SpamAssassin rules to look for the messages, but you probably don't want to AUTOLEARN the messages since any tokens in the email are probably spam hosts. As pointed out earlier, this type of email should bypass SpamAssassin in procmail (etc). Anyway, no sample -- no way to point out your issue. Do paste at least the headers of such a mail. Yep. -jeff
Re: Controlling spamd logging from spamc
From: Martin Gregorie Date: Tue, 02 Jun 2009 16:54:11 +0100 How difficult would it be to let spamc control spamd's logging output on a per-message basis? My reason for asking is this: I maintain a body of spam that I use to develop and regression test local rules and, during rule development, use spamc to pass the test messages through my only copy of spamd. This is useful because I can keep the test messages in a normal user on a different host from the one running spamd and avoid local configuration ambiguities. However, as part of my logwatch environment I run a perl program to collect the day's spam stats. I find that the stats are meaningless any day I develop and/or regression test rules because, of course, spamd is logging these as well as actual mail. I should add that, since my ISP introduced greylisting, the 'spam' logged during regression testing is at least 12 times the volume of genuine spam received that day, so the day's stats are meaningless and so are any stats generated by scanning the whole of /var/log/maillog* It would be useful for me to be able to disable spamd logging during rule testing. Wouldn't it be easier to run another spamd on a different machine for rule development and testing? Or perhaps just running as a different 'test' user, and then ignore log messages for that user in the statistics. Would anybody else find this a useful feature too? I've sometimes wanted the other way - eg get more debugging output for a particular message. -jeff
Re: AWL functionality messed up?
From: Linda Walsh Date: Wed, 27 May 2009 17:28:35 -0700 Jeff Mincy wrote: >From: Linda Walsh >Date: Wed, 27 May 2009 12:48:43 -0700 > >Bowie Bailey wrote: > >At face value, this seems very counter productive. > > You still aren't understanding the wiki or the AWL scoring or what AWL > is trying to do. Ah, but it only seems I'm daft, today...:-) >If I get spam from 1000 senders, they all end up in my >AWL??? > > yes. every email+ip address pair that sends you email winds up in > your AWL with an average score for that pair. This is ok. GRRRnot so ok in my mindset, but ... and ... errr.. well that only makes it more confusing, in a way...since I was only 99% certain that I'd never gotten any HAM from hostname '518501.com' (thinking for a short period that AWL might be classify things by hosts as reliable or not, instead of, or in addition to by email-addr), but I'm 99.97% certain I've never gotten any HAM from user 'paypal.notify' (at) hostname '5185 It is using the relay IP address, not the hostname... You've most likely received some other spam from this email+ip pair that was scored as ham. Hard to tell without seeing the original scores. >AWL should only be added to by emails judged to be 'ham' via >the feed back mechanisms --, spammers shouldn't get bonuses for >being repeat senders... > > You are getting too attached to the 'whitelist' part of the name. > Pretend AWL stands for average weighting list. = Aw...come on. Isn't the world difficult enough without changing white to black or white to weighing? I mean, we humans have enough trouble agreeing on what our symbols, "words" mean in relation to concepts and all without ya goin' and redefining perfectly good acceptable symbols to mean something else completely and still claim it to be some semblance of English. No wonder most of the non-techno-literate humans on this world regard us techies with a hint of suspicion regarding the difficulty of problems. We go around redefining words to suit reality and catch the heat when the rest of the world doesn't understand our meaning: I don't think AWL is the best possible name for the functionality, simply because it is easy to misinterpret. > AWL isn't whitelisting spammers. It is pushing the score to the > average for that sender. The sender can have a high average or a low > average. --- An average? So it keeps the scores of all the past emails of every email we ever got sent? Must just store a weighted average -- otherwise the space (hmm...someone said something about 80MB+ auto-whitelist DB files?) AWL tracks the total score and the number of messages. Why not call it the Historically Based Score Normalizer or HBSN module? Db file could be "historical-norms" or something. Call it BOB if that will help ... > If the previous email from a particular sender was FP or FN then AWL > will have an incorrect average and will wind up doing or trying to do > the wrong thing with subsequent email for that sender. Maybe it shouldn't add in the 'average' unless it exceeds the 'auto-learning threshold'?? I.e. something like the 'bayes_auto_learn_threshold_nonspam' for HAM and the 'bayes_auto_learn_threshold_spam' for SPAM. Assuming it doesn't already do such a thing, it would make a little sense...so as not to train it on 'bad data'... Perhaps. I don't have a particularly strong opinion. When I run "sa-learn --spam " over a message, can I assume (or is it the case) that telling SA, a message was 'spam' would assign a sufficiently large value to the 'HBSN' value for that sender to reduce any effect of having falsely (if it is likely to happen) incorrect value? Nope. Or might I at least assume that each "sa-learn" over a message will modify it's AWL score appropriately? no. You shouldn't assume. sa-learn doesn't modify the AWL entry. You can use spamassassin --add-to-blacklist. > You can remove addresses using spamassassin --remove-from-whitelist Yes...saw that after visiting the wiki. Is there a --show-whitelist-with-current-scores-and-their-weight switch as well (as opposed to one that only showed the addr's in the white list, or only showed the non-weighted scores)? If I understand what you are asking for here, you can add an X-Spam-AWL header that giv
Re: AWL functionality messed up?
From: Linda Walsh Date: Wed, 27 May 2009 12:48:43 -0700 Bowie Bailey wrote: > Linda Walsh wrote: >> >> I got a really poorly scored piece of spam -- one thing that stood out >> as weird was report claimed the sender was in my AWL. > > Any sender who has sent mail to you previously will be in your AWL. > This is probably the most misunderstood component of SA. Read the wiki. > > http://wiki.apache.org/spamassassin/AutoWhitelist At face value, this seems very counter productive. You still aren't understanding the wiki or the AWL scoring or what AWL is trying to do. If I get spam from 1000 senders, they all end up in my AWL??? yes. every email+ip address pair that sends you email winds up in your AWL with an average score for that pair. This is ok. WTF? AWL should only be added to by emails judged to be 'ham' via the feed back mechanisms --, spammers shouldn't get bonuses for being repeat senders... You are getting too attached to the 'whitelist' part of the name. Pretend AWL stands for average weighting list. How do I delete spammer addresses from my 'auto-white-list'? (That's just insane..whitelisting spammers?!?!) AWL isn't whitelisting spammers. It is pushing the score to the average for that sender. The sender can have a high average or a low average. If the previous email from a particular sender was FP or FN then AWL will have an incorrect average and will wind up doing or trying to do the wrong thing with subsequent email for that sender. You can remove addresses using spamassassin --remove-from-whitelist -jeff
Re: spamassassin runs razor spamc not
From: Mester Date: Fri, 22 May 2009 14:52:08 +0200 >>> Check in the ~/.spamassassin/user_prefs file for the user that runs >>> amavisd-new. I know the Mandriva package has that set to 'use_razor2 >>> 0', so I always have to hunt it down and fix it. >> I had no use_razor2 line in the ~amavis/.spamassassin/user_prefs file >> but after appending these lines to the file: >> use_razor2 >> razor_config /var/lib/amavis/.razor/razor-agent.conf >> and restarting both amavis and spamassassin nothig has changed. > > Then, you need to run some of the amavisd-new debugs > > I believe the syntax is > > [amav...@foo]$ /usr/sbin/amavisd debug-sa plugin It worked. And now I found the error: amavis user couldn't read the /var/log/razor-agent.log file. I modified the owner of that file to amavis and now I see the check lines in that file. Is there a way to instruct spamassassin to write the razor, pyzor and dcc check's result to every e-mail's header an not only for spams? SpamAssassin has add_header that can be used for Pyzor and DCC. add_header all Pyzor _PYZOR_ add_header all DCC _DCCB_; _DCCR_ I don't know how headers are added in amavis. -jeff
Re: learning from IMAP spam collection
From: Michael Monnerie Date: Tue, 19 May 2009 09:34:53 +0200 On Sonntag 17 Mai 2009 Michael Monnerie wrote: > Why is it so extremely > slow and CPU consuming just to remove any existing markups? There really seems to be no other way than calling "spamassassin -d" to remove existing markups. I guess I will create an account where a script takes all messages from folder X, removes markup, and stores to Y. Like this, I don't mind too much how long it takes. It's still a PITA that there's no quick "spamc" like way to remove markups. You can use formail to remove headers. It is way faster than spamassassin -d. The only trick is listing all of the headers that can be added by SpamAssassin. formail -b -t -I X-Spam-Status: -I X-Spam-Flag: -I X-Spam-Checker-Version: -I X-Spam-Rbl: -I X-Spam-Pyzor: -I X-Spam-DCC: -I X-Spam-Level: -I X-Spam-Bayes: -I X-Spam-Relay: -I X-Spam-Report: -I X-Spam-AWL: -I X-Spam-Karma: -I X-Spam-ASN: -I X-Spam-CRM114: -I X-Spam-Relay-Country: < msg -jeff
Re: whitelist_from_spf
From: Alvaro MarÃn Date: Thu, 14 May 2009 13:30:49 +0200 It seems that there is a problem resolving DNS records of that domain so I want to whitelist it. If I add: whitelist_from_spf *...@orange.es It's ignored by SA, as the log says. Reviewing code of SPF.pm from SpamAssassin, I see: # if the message doesn't pass SPF validation, it can't pass an SPF ... So, which is the purpose of this whitelist feature? If the SPF check fails, it can't do whitelist? Yes. The whitelist check is done after the SPF check. Anybody can have a SPF record. SPF just means that the message is genuine = not forged. You can get genuine spam. If you aren't getting SPF_PASS on the message then whitelist_from_spf won't do anything. If you are getting SPF_PASS on email from other domains then the domain you are trying to whitelist probably does not have spf setup. -jeff
Re: Properly integrating clamAV into SpamAssassin
From: Adam Katz Date: Sun, 03 May 2009 18:47:21 -0400 I am under the impression that virus checking is *not* that much easier than a fully-loaded SA implementation, so therefore spam detection should run first. Counter-point: online lookups cost bandwidth and latency, virus detection doesn't (yet) require any. Have you timed ClamAV? It is essentially free. On my machine I get >100 ClamAV virus scans per second, which is *way* faster than SpamAssassin. Pause. Constructive comments and criticisms? I disagree with your premise... Time ClamAV and your fully-loaded SA implementation on a set of messages. You can time SpamAssassin with and without network tests for a more complete picture. Don't get too caught up in the above part, it is all illustrative in getting to my question below. Mail that passes SpamAssassin but gets caught by ClamAV would add value to SA's Bayesian and AWL databases and thus the message stands a chance at getting caught in the future regardless of its viral content. Feeding virus email into SpamAssassin Bayes seems like a bad idea to me. The bayes tokens aren't going to be all that useful for catching non virus spam. Adding the virus email into AWL seems somewhat reasonable since any further email from the same IP address is likely to be another virus or botnet spam. However, in practice any botnet spam will use different random email addresses so you probably won't get any awl hits on the AWL addresses learned from virus email. -jeff
Re: Almost no score
From: Charles Gregory Date: Fri, 1 May 2009 10:48:00 -0400 (EDT) Uh, what do these 'ratware' rules trigger on? The rules trigger on spam with a particular Message-Id and boundary pattern. How effective are they, and what are the chances of false positives? For last month the KB_RATWARE_OUTLOOK_08 rule hits 21% of spam (4665 hits out of 21748 spam). It works great here. I haven't seen any FP. Your mileage may vary. I got the rules from Karsten's sandbox: http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/kb/70_misc.cf I would imagine that these rules will eventually show up in sa-update. -jeff On Thu, 30 Apr 2009, LuKreme wrote: > (single lines) > header KB_RATWARE_OUTLOOK_16 ALL =~ /^Message-Id: > <([0-9a-f]{8})\$([0-9a-f]{8})\$.{100,400}boundary="=_NextPart_000__\1\.\2/msi > # " > > header KB_RATWARE_OUTLOOK_12 ALL =~ /^Message-Id: > <([0-9a-f]{8})\$([0-9a-f]{4})[0-9a-f]{4}\$.{100,400}boundary="=_NextPart_000__\1\.\2/msi > # " > > header KB_RATWARE_BOUNDARYALL =~ /^Message-Id: > <([0-9a-f]{8})\$[0-9a-f]{8}\$.{100,400}boundary="=_NextPart_000__\1\./msi > # " > > score KB_RATWARE_BOUNDARY 2.0 > score KB_RATWARE_OUTLOOK_16 0.1 > > > -- > Exit, pursued by a bear. >
Re: 'anti' AWL
From: Charles Gregory Date: Wed, 29 Apr 2009 14:31:22 -0400 (EDT) I just turned off my AWL today, because of FP issues but > f...@example.com sends me lots of mail. Say it's over 100. It's all ham and > it all comes from mail.example.com. The AWL for this email couplet is , say > -2.1. An email comes in from f...@example.com but sent from spam.spammer.tld > and score 7.0. It gets an additional, say, .42 (20% of the AWL) to score > 7.42 instead. Now, another mail from f...@example.com comes in from > mail.spam2.tld, this one scores 4.3. It gets a +.42 for missing the match on > mail.example com, and gets a +.288 for missing the match on spam.spammer.tld This sounds like an attempt to mimic the effects of SPF records by noting which servers send "most" of the mail for a given address. Sadly, this logic breaks down when the spammers 'get there first' and/or send a greater volume of mail than the genuine sender. Admittedly the latter situation is a low probability for any single sender, but in the big picture, *someone* is getting their AWL reputation trashed every time a spammer forges their e-mail. AWL stores the IP/16 address with the email address. So your awl reputation is not being trashed by forged e-mail that comes from a different IP address. Just this Monday I had a phishing attack againstmy clients, with *dozens* of e-mails, all purporting to come from ME that came from the *same* server! In this case, as I only send a half dozen messages per month from that account, the spammer would get the favored rating? Only if the spammer uses the same server that you do. -jeff
Re: 'anti' AWL
From: LuKreme Date: Tue, 28 Apr 2009 08:43:46 -0600 OK, working on my first cup of coffee this morning, so maybe this has potential. The way the AWL works is by keeping track of the origin of emails, both the address and the server (the top line Received header?) that send the email. So, lets say that I have a lot of email from f...@example.com and that foo's email is sent to me via mail.example.com. Now, I get an email claiming to be from f...@example.com but sent to me from suspiciousserver.tld, so the AWL is not applied. Your idea will FP anytime anybody adds a new email device or the ISP changes (etc). You could use the sagrey plugin to add a point to email from a new email address+ip pairs. -jeff
Re: AWL and FP's....
From: Charles Gregory Date: Wed, 22 Apr 2009 15:56:53 -0400 (EDT) Just curious if anyone has ever found a 'clean' way to handle the 'damage' done to the AWL when someone's mail is blocked by a false positive, and the sender is stupid enough to keep retrying the offending mail? Meaning that the first message from the sender was incorrectly marked as spam and AWL then made sure that all subsequent messages from the same sender were also marked as spam? The easiest way to fix it is to smash the AWL entry with spamassassin --add-to-whitelist or remove the AWL entry using --remove-from-whitelist. I would rather not turn off AWL. I like the way it gives a negative score bias to frequent correspondents. But is there a (sub)setting to allow me to permit the negative bias, but *not* allow it to add a positive one? Nope - the only thing you can do is set the factor which acts on both positive and negative scores. And while I'm at it, can anyone verify whether 'constantcontact' is really a legit mail service or a spam haven? That's the FP that caused this issue they do email for various organizations. -jeff
Re: use_auto_whitelist error in lint
From: realshock Date: Thu, 9 Apr 2009 06:56:05 -0700 (PDT) Matt Kettler-3 wrote: > Find out where else you've got "use_auto_whitelist 0" in your config, > and remove it. > On the plus side, it does confirm you've correctly disabled the plugin. I searched all over the place, and following your directions, do you think this command will find where it is? # grep -iR use_auto_whitelist /* spamassassin -D --lint prints out the config files, eg: spamassassin -D --lint 2>&1 | fgrep 'config: read file' The use_auto_whitelist is in one of those config files. -jeff
Re: need help - procmail & spamassassin
From: "sebast...@debianfan.de" Date: Sun, 05 Apr 2009 01:56:38 +0200 Hello, i am filtering mails with spamassassin & procmail. This is more of a procmail question, so it doesn't actually belong here. The header of message X-Spam-Level: ** I want to sort mails into some different directories. 10 or more --> directory 10 9 --> directory 9 and so one Do you really want that many different mail folders? Wouldn't low>=5, mid>=10 and high>=15 be sufficient? But - nothing happens - the mails are all in the /Maildir/new directory why ? The .*\( part. :0: * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*\* Maildir/10/new You don't need the .* and you don't want the \( * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\* Also, You can use the numeric score directly. For example, you can set X_SPAM_SCORE in a procmail recipe the be the number following score= on the X-Spam-Status line. X_IS_SPAM="Unknown" X_SPAM_SCORE="" :0 * ^X-Spam-Status: \/.* { :0 * ^X-Spam-Status: \/(Yes|No|YES|NO|Skipped) { X_IS_SPAM="$MATCH" } :0 * ^X-Spam-Status: (Yes|No|YES|NO)[, ]+(hits|score)=\/([-0-9.]+) { X_SPAM_SCORE="$MATCH" } } Then you can do recipes like this that matches spam scoring 12.5 or higher. SPAM_CUTOFF=12.499 :0 * X_IS_SPAM ?? (Yes|YES) *$ -$SPAM_CUTOFF ^0 *$ $X_SPAM_SCORE ^0 somefolder :0: * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*\* Maildir/10/new :0: * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\* X-Spam-Level: *** Maildir/9/new You don't want the extra 'X-Spam-Level: ***' line here. -jeff
Re: New kind of spam
From: Arvid Ephraim Picciani Date: Tue, 31 Mar 2009 12:33:49 +0200 > What do you mean "its impossible to train bayes"? i was assuming the random text at the end is what couses my bayes db to behave randomly. Random text that occurs only in spam rapidly becomes a spam sign. Random spam text that also occurs in ham requires a period of adjustment for Bayes, but eventually Bayes figures it out. > Bayes really can be trained to deal with this message. > For example, I get BAYES_95: well i get 00 An occasional spam getting a low bayes score is ok, but lots of spam getting BAYES_00 is a problem. Train Bayes with more spam messages and correct any incorrectly learned messages. > After I learn this message the probability increases to BAYES_99 yes, for that specific message. what exactly is the point of learning specific messages when the next one will be different anyway. Perhaps you are missing the point of bayes. I got bayes_95 on the message before training on the message. My SpamAssassin hadn't seen the message before, but it had trained on similar spams. Bayes breaks the message up into various tokens, some of tokens from this or any spam message will be repeated in other spam messages. > % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep --text X-Spam-Bayes > X-Spam-Bayes: bayes=1., N=50(47-2+29), ham=(sort, doing), spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, HX-Mozilla-Status2:) interestingly i dont have that header. i'll check docs. The X-Spam-Bayes header was added with add_header all Bayes bayes=_BAYES_, N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_) -jeff
Re: New kind of spam
From: Arvid Ephraim Picciani Date: Wed, 25 Mar 2009 16:59:58 +0100 http://codepad.org/W53onqK9 i gave on this kind of spam. its impossible to train bayes and changing to fast to make custom rules. ... What do you mean "its impossible to train bayes"? Bayes really can be trained to deal with this message. For example, I get BAYES_95: wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep --text X-Spam-Bayes X-Spam-Bayes: bayes=0.9679, N=50(29-2+11), ham=(sort, doing), spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, HX-Mozilla-Status2:) After I learn this message the probability increases to BAYES_99 % wget -O - -q http://codepad.org/W53onqK9/raw.txt | sa-learn --spam Learned tokens from 1 message(s) (1 message(s) examined) % sa-learn --sync % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep --text X-Spam-Bayes X-Spam-Bayes: bayes=1., N=50(47-2+29), ham=(sort, doing), spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, HX-Mozilla-Status2:) Note that Bayes has determined that UD:spaces.live.com is a spam sign. The X-Spam-Bayes header is added with add_header all Bayes bayes=_BAYES_, N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_) -jeff
Re: Blacklisting Cyrillic
From: Kenneth Porter Date: Thu, 26 Mar 2009 17:22:21 -0700 I'd like to score anything in Windows-1251 fairly high, as I don't expect to get anything legitimate in that charset. How can I read the charset declared in a Subject header, or in a MIME part, for matching in a rule? The only tools I see are ok_locales and CHARSET_FARAWAY, but those seem like heavy hammers as they blacklist everything and then require me to whitelist what I want. I'd rather the reverse: let me list which codepages to reject. I tried this rule but it's not firing and I'm not sure why: describe KP_CYRILLIC Cyrillic code page header KP_CYRILLIC Subject =~ /Windows-1251/ scoreKP_CYRILLIC 0.1 Try Subject:raw to inhibit decoding? -jeff
RE: Server overload, queuing for SA possible?
From: Bowie Bailey Date: Thu, 26 Mar 2009 12:07:23 -0500 Jeff Mincy wrote: > >If I'm reading the spamc man page correctly, it will wait 5 >minutes for spamd to process the message, but it will only wait >about 3 seconds for a connection to spamd (3 tries with 1 second >sleep between them). That's not much of a queue. Or am I missing >something? > > The --connect-retries=retries and --retry-sleep=sleep options control > connection attempts. The connection attempt was successful, you are > just waiting for spamd to get around to the message. If spamd > refuses the connection then spamc will retry a few times. Ok, so spamd will accept the connection and hold onto it until a child process is available. How many connections can spamd queue? I dunno. As I recall, on linux the maximum number of connections is controlled by some kernel limit, probably 4000. You'll run out of something else before you get anywhere near this number. Of course, messages will start timing out in spamc if they are not processed fast enough. -jeff
RE: Server overload, queuing for SA possible?
From: Bowie Bailey Date: Thu, 26 Mar 2009 09:55:45 -0500 Jeff Mincy wrote: >From: Bowie Bailey >Date: Thu, 26 Mar 2009 08:48:30 -0500 > >Brian J. Murrell wrote: >> On Wed, 2009-03-25 at 15:01 -0400, Michael Scheidell wrote: >> > >> > Match your MTA processes to the spamd children. Your MTA will >send > > 4xx 'busy now, come back to play later' message. Let the >sending > > MTA queue it back up (or zombies will just go away) >> >> I don't really see that as a socially responsible action. If my >> mailserver was completely loaded to the point of not even being >able > to queue a message, I'd buy pushing back on the sender with >a 4xx, > but the reality is that while I may have maxed out my >spamd children, > I can likely still receive and queue mail >locally. > >> The queueing up of mail to spamd really belongs on the local >server, > and should not become a burden on sending MTAs. > >This really depends on where you are running SA in the delivery >process. > I'm kinda gathering that this is not possible within >spamassassin > itself. Probably in fact it is for at least some >MTAs but how to > achieve it becomes MTA specific and OT here. > >SA is not capable of any sort of queuing. If you need that, you >will have to make your MTA do it one way or another. > > The spamassassin executable doesn't queue - it just starts up a new > process each time it scans a message. > > However, spamd queues connections when all of the children are busy > processing messages. > > From the spamd man page: > >-m number , --max-children=number >This option specifies the maximum number of children to >spawn. Spamd will spawn that number of children, then >sleep in the background until a child dies, wherein it >will go and spawn a new child. > >Incoming connections can still occur if all of the >children are busy, however those connections will be >queued waiting for a free child. The minimum value is 1, > the default value is 5. > > As long as messages are processed reasonably quickly everything will > be fine. If spamd takes too long to process messages then the MTA > will start timing out (like 2-10 minutes). What happens then is up to > the MTA. > -jeff Ok, it does queue connections, but that is very limited. This thread is specifically talking about what happens when spamd is taking too long. Yes. We were getting away from that issue. The machine may not have enough resources to run the number of spamd children. A caching name server helps with throughput. Some more details about the machine could be useful as well as details on what else is happening on the machine when the spamd queue backs up. If I'm reading the spamc man page correctly, it will wait 5 minutes for spamd to process the message, but it will only wait about 3 seconds for a connection to spamd (3 tries with 1 second sleep between them). That's not much of a queue. Or am I missing something? The --connect-retries=retries and --retry-sleep=sleep options control connection attempts. The connection attempt was successful, you are just waiting for spamd to get around to the message. If spamd refuses the connection then spamc will retry a few times. -jeff
RE: Server overload, queuing for SA possible?
From: Bowie Bailey Date: Thu, 26 Mar 2009 08:48:30 -0500 Brian J. Murrell wrote: > On Wed, 2009-03-25 at 15:01 -0400, Michael Scheidell wrote: > > > > Match your MTA processes to the spamd children. Your MTA will send > > 4xx 'busy now, come back to play later' message. Let the sending > > MTA queue it back up (or zombies will just go away) > > I don't really see that as a socially responsible action. If my > mailserver was completely loaded to the point of not even being able > to queue a message, I'd buy pushing back on the sender with a 4xx, > but the reality is that while I may have maxed out my spamd children, > I can likely still receive and queue mail locally. > > The queueing up of mail to spamd really belongs on the local server, > and should not become a burden on sending MTAs. This really depends on where you are running SA in the delivery process. > I'm kinda gathering that this is not possible within spamassassin > itself. Probably in fact it is for at least some MTAs but how to > achieve it becomes MTA specific and OT here. SA is not capable of any sort of queuing. If you need that, you will have to make your MTA do it one way or another. The spamassassin executable doesn't queue - it just starts up a new process each time it scans a message. However, spamd queues connections when all of the children are busy processing messages. >From the spamd man page: -m number , --max-children=number This option specifies the maximum number of children to spawn. Spamd will spawn that number of children, then sleep in the background until a child dies, wherein it will go and spawn a new child. Incoming connections can still occur if all of the children are busy, however those connections will be queued waiting for a free child. The minimum value is 1, the default value is 5. As long as messages are processed reasonably quickly everything will be fine. If spamd takes too long to process messages then the MTA will start timing out (like 2-10 minutes). What happens then is up to the MTA. -jeff
Re: Spam Assassin White List
From: Matus UHLAR - fantomas Date: Tue, 24 Mar 2009 15:30:23 +0100 On 23.03.09 21:58, dsh979 wrote: > I did not realise that items listed on the white list or the black list > would still be subject to the operation/analysis of the SpamAssassin Rules. all rules are processed unless you play with ShortCircuit plugin. Beware of that: It may render the SA useless if you don't knwo what you are doing. > You have asked why I have set the required score the 100. Lengthy > explanation (sorry). I have done this to prevent SpamAssassin from > inserting SpamWarnings into the header/body of the relevant email. There's report_safe option to configure that. Also rewrite_header > Q:How can I list items/users on a "white list" or a "black list" without the > lists (and items) being the subject of further analysis by the SpamAssassin > Rules (and therefore obtaining the same score for each item on the relevant > list, irrespective of the operation of the SpamAssassin Rules, that is > -100=white list items & +100 = black list items)? I somehow do not understand this question. He wants the white/black lists to run first and then short circuit. So anybody in the whitelist gets a score of -100 and anybody in the blacklist gets a score of +100. This can probably be done with the ShortCircuit plugin and setting the priority of the rules so that they run first. Black lists aren't all that useful for stopping spam. The email addresses are forged in spam. -jeff
Re: negative scores for spam
From: Chris Barnes Date: Mon, 23 Mar 2009 11:14:37 -0500 Jeff Mincy wrote: > Yow. The negative scoring bayes rules are extremely reliable when well > trained. Ham messages are not trying to evade the filter. Defeating > bayes with poison is mostly a myth. The random garbage might work the > first time but not the second time as long as you are training these > messages as spam. If you are getting lots of BAYES_00 hits on spam > then the problem is almost certainly incorrect training where spam > messages were incorrectly learned as ham. Fair enough. But the problem remains. A simple glance at this list shows that this happens often enough to be a fairly common problem. The question is: How does one fix the problem after it occurs? The way to fix the problem is to relearn any incorrectly learned messages. So any spam message that was incorrectly learned as ham, either automatically or manually, needs to be correctly relearned as spam using sa-learn. You should also learn as spam any spam messages that hits BAYES_00, or anything less than BAYES_50. You should also do the same thing for HAM messages hitting BAYES_50 - BAYES_99. The more messages that you correctly train the more accurate and definitive bayes will be. If you don't have the incorrectly learned messages to retrain then you can always start over by removing the bayes database files in your .spamassassin directory. -jeff
Re: negative scores for spam
From: Jesse Stroik Date: Fri, 20 Mar 2009 16:14:39 -0500 Hoover Chan wrote: > The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores. You are getting negative scores for auto white list and for bayes_00. It's a matter of taste and what you believe makes sense, but I don't consider bayes to be all that accurate (since there are methods for defeating bayes, poisoning bayes, etc). As such, I don't allow Bayes to assign negative scores or positive scores within a couple of points of the threshold. You can do so by assigning scores like this: score BAYES_00 0 score BAYES_05 0 score BAYES_20 0 score BAYES_40 0 Yow. The negative scoring bayes rules are extremely reliable when well trained. Ham messages are not trying to evade the filter. Defeating bayes with poison is mostly a myth. The random garbage might work the first time but not the second time as long as you are training these messages as spam. If you are getting lots of BAYES_00 hits on spam then the problem is almost certainly incorrect training where spam messages were incorrectly learned as ham. I also disable AWL since a lot of spam, especially the stuff most likely to be tested against spamassassin, will like use known good email addresses from your domain as the "from" address. This is fairly likely to hit on the AWL. Yow again. AWL uses email address and the IP address. So forged email addresses used in spam is not going to use the same EMAIL+IP pair as legitimate email using the same email address. Again, it's just a matter of taste and it all depends on how you've set up your scoring. I'm pretty cautious to ensure there aren't false positives as that would decrease the value of spamassassin greatly for us, but I otherwise avoid AWL and Bayes negative scores. If you sent us a copy of the spam, we could test it and show you what should be hitting. Use pastebin instead. -jeff
Re: negative scores for spam
From: Hoover Chan Date: Fri, 20 Mar 2009 13:55:08 -0700 (PDT) The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores. Train BAYES. The message hit BAYES_00. You want BAYES_99. So either you have incorrectly learned similar messages or you haven't trained enough. -jeff -- Hoover Chan c...@sacredsf.org Technology Director Schools of the Sacred Heart Broadway St. San Francisco, CA 94115 - "Rick Macdougall" wrote: > Hoover Chan wrote: > > Can someone point me to what I can do to my Spam Assassin config for > a situation like the following? > > > > X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6 > > tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001, > > URIBL_BLACK=1.955, URIBL_GREY=0.25] > > > > That is, a positive score criterion with a spam message that comes > out with a negative number. > > > > Errr > > -1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600 > > Where do you see that it should be positive ? > > Regards, > > Rick
Re: SpamAssassins bayes mechanism and message headers
From: Matt Kettler Date: Wed, 18 Mar 2009 19:49:53 -0400 Jeff Mincy wrote: >From: Matt Kettler >Date: Tue, 17 Mar 2009 21:30:02 -0400 > >fl...@pbartels.info wrote: >> Hello, >> >> instead of disabling a lot possibly set message headers using >> "bayes_ignore_header" and ending up in strange configs like: >> >> bayes_ignore_header Return-Path >... >> (found on the net) >Where? >> >> shouldn't SpamAssassins bayes mechanism just ignore the complete >> message header and just look at the body? >> This seems useful in my opinion. >It seems like a very misguided idea to me. > >Is there any reason to think headers make bad tokens? >Do you have any test data showing this improves your bayes accuracy? > > Yes - I think some headers make extremely bad tokens for bayes, for > example the X-Mailer/User-Agent headers. 40% of the spam I get > claims to have Microsoft Outlook as a x-Mailer. So bayes rapidly > determines that *UAMicrosoft (etc) is an extremely strong token. > These *UA tokens were enough to push a short ham message to BAYES_99. > When I added an bayes_ignore_header the score dropped to ~BAYES_40 > That seems rather extraordinarily strange. Did the messages match no other tokens at all? (ie: did you run it through spamaassassin -D bayes before and after?) This was the X-Spam-Bayes header that was added at the time: X-Spam-Bayes: bayes=1., N=27(19-0+13), ham=(), spam=(HTo:U*mincy, HTo:D*com, HTo:D*rcn.com, H*F:D*net, H*UA:Build) This header was added using: add_header all Bayes bayes=_BAYES_, N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_) So, there are 27 tokens, 0 hammy, 13 spammy. I'd be very interested in what's going on there, because it makes very little sense unless the message really matched very, very little other existing training. 3 of the top 5 spammy tokens eg: HTo:U*mincy, HTo:D*com, HTo:D*rcn.com come from the To: mi...@rcn.com header. The H*UA:Build came from a 'X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)' header. As I recall, there were various H*UA:Outlook etc headers. Bayes was 100.000% sure that this message was spam based on the To, X-Mailer, and From headers. The envelope on all email message that I read at home are addressed to mi...@rcn.com (ignoring for the moment that mi...@starpower.net also happens to get to me). The 'To:' header is either going to be mi...@rcn.com or some made up email address that will never be repeated or it is my email address. So Bayes will see my email address in both spam and ham. At the time more than 80% of email I was getting at rcn.com was spam so, To: mi...@rcn.com was turned into three strong spam tokens. My real mi...@rcn.com email address in the To header says nothing about the spamminess of the message. This is in contrast to the mi...@starpower.net email address which is almost certainly spam and has been added to the blacklist_to). So my solution was to add 'bayes_ignore_header To From' and use blacklist_to/blacklist_from for the suspect email addresses. I came up with similar justification for adding 'bayes_ignore_header X-Mailer'. The body of the message was a single sentence asking me about my primary music software. If you want to see more detail lets take it off the public mailing list. -jeff
Re: SpamAssassins bayes mechanism and message headers
From: Greg Troxel Date: Wed, 18 Mar 2009 15:33:31 -0400 Jeff Mincy writes: >From: Matt Kettler >Date: Tue, 17 Mar 2009 21:30:02 -0400 > >> shouldn't SpamAssassins bayes mechanism just ignore the complete >> message header and just look at the body? >> This seems useful in my opinion. >It seems like a very misguided idea to me. > >Is there any reason to think headers make bad tokens? >Do you have any test data showing this improves your bayes accuracy? > > Yes - I think some headers make extremely bad tokens for bayes, for > example the X-Mailer/User-Agent headers. 40% of the spam I get I think I'm having a similar problem, where I get spam via a mailinglist, and bayes gives the spam credit for having similar headers to the ham which arrives on the list. I'm not so concerned about including the headers as they arrive at the list server, but all the headers added from receipt by the list server seem inappropriate. I'll try bayes_ignore_header. Scanning mailing list email is more trouble that it's worth. It can be done, but you have to be very motivated and it is a lot of work to maybe catch a few mailing list spam messages. Bayes needs to ignore any headers and any special footer tokens added by the mailing list postings. You need to extend trusted_networks to the mailing list so that various tests are done on the submitter instead of the mailing list. DCC should be whitelisted for most mailing lists since the email messages are bulk. Any automatic reporting needs to be turned off. I'm sure there are other things that I'm forgetting. If the mailing list has reasonably good spam filtering then just skip running SpamAssassin. -jeff
Re: SpamAssassins bayes mechanism and message headers
From: Matt Kettler Date: Tue, 17 Mar 2009 21:30:02 -0400 fl...@pbartels.info wrote: > Hello, > > instead of disabling a lot possibly set message headers using > "bayes_ignore_header" and ending up in strange configs like: > > bayes_ignore_header Return-Path ... > (found on the net) Where? > > shouldn't SpamAssassins bayes mechanism just ignore the complete > message header and just look at the body? > This seems useful in my opinion. It seems like a very misguided idea to me. Is there any reason to think headers make bad tokens? Do you have any test data showing this improves your bayes accuracy? Yes - I think some headers make extremely bad tokens for bayes, for example the X-Mailer/User-Agent headers. 40% of the spam I get claims to have Microsoft Outlook as a x-Mailer. So bayes rapidly determines that *UAMicrosoft (etc) is an extremely strong token. These *UA tokens were enough to push a short ham message to BAYES_99. When I added an bayes_ignore_header the score dropped to ~BAYES_40 Obfuscated words like 'st0ck' are 100% indications of spam (or of messages that discuss spam), so these words work great for bayes. A 'X-Mailer: Microsoft Office Outlook' header doesn't really tell you anything about the message, at least not to the extent that bayes treats these tokens. The Message-ID tokens are also low quality tokens. Most of these tokens are hapaxes that are never used by other messages. These just fill up the bayes database. Maybe if the Message-ID tokens were even more processed then maybe these could be more useful for bayes - eg - replace 1234.56789 with a format %4d.%5d, or throw out all of the timestamp numbers and keep the just the stuff after the @. -jeff
Re: Some emails pass spamassassin unprocessed
From: Monky Date: Fri, 20 Feb 2009 03:31:14 -0800 (PST) Hello, I am running the Spamd Daemon version 3.2.5 on my Linux web and mail server and in general it works well. From time to time (somewhere in between 1-10% of all emails) spam passes the filter - but not because spamassassin decides that it is ham but because the email never gets processed by spamassassin (the header shows no X-Spam at all). look in the mail log files to see what was happening when messages are passed through unprocessed. SpamAssassin could be waiting on lock files. For example, Bayes files are locked while an automatic Bayes expiry runs. -jeff
Re: vbounce and out of office messages
From: Kai Schaetzl Date: Sun, 01 Feb 2009 17:40:00 +0100 Jeff Mincy wrote on Sun, 1 Feb 2009 10:01:49 -0500: > I use vbounce rules to detect bounce messages that were missed by > various procmail filtering rules. Any message identified as a bounce > is processed and delivered differently in procmail rules. So, any > vbounce FP is rather painful. No, it is not, unless you score these rules too high or unless you use the single rules for triggering other actions. That's what SA is all about: scoring. ... Huh? You don't want bounces to be processed as regular spam. If you train bayes on bounces then you are training bayes to detect bounces and pretty soon SpamAssassin will detect all bounces, including valid bounces as spam. This comment is taken from the 20_vbounce.cf file: # If you use this, set up procmail or your mail app to spot the # "ANY_BOUNCE_MESSAGE" rule hits in the X-Spam-Status line, and move # messages that match that to a 'vbounce' folder. ... If you try to (mis-)use it in other ways problems are to be expected. That's not the fault of the vbounce rules. The purpose of 20_vbounce is to detect and identify bounces so that you may process bounce messages differently. So I disagree, any FP in the vbounce rules is the fault of vbounce rules and prevents these rules from being used as designed. AFAIK, the default score for the all BOUNCE rules is 0.1 Right. If you aren't going to use the vbounce rules for extra processing then there really isn't any point in running the rules. The low default score pretty much guarantees that message classification will not change one way or the other. -jeff
Re: vbounce and out of office messages
From: Kai Schaetzl Date: Sun, 01 Feb 2009 14:31:17 +0100 Karsten Bräckelmann wrote on Fri, 30 Jan 2009 19:42:16 +0100: > FWIW, and to make Michael happy, I just caught one today -- hit another > rule, __BOUNCE_OOO_3. Sadly, it also hit __BOUNCE_AUTO_REPLY. So there's > more to disable... why? Why disable a rule because of a few FPs? If that rule isn't scored in any way that makes it a threat that is perfectly acceptable. It's the overall behavior of a rule that makes it worth or not worth using it, not a few FPs. Nobody, at least not me, expects these rules to be free of FPs. I use vbounce rules to detect bounce messages that were missed by various procmail filtering rules. Any message identified as a bounce is processed and delivered differently in procmail rules. So, any vbounce FP is rather painful. If you aren't doing anything special delivering bounce messages then a FP in this rule wouldn't matter very much. -jeff
Re: profile the various tests being done
From: "Brian J. Murrell" Date: Wed, 21 Jan 2009 19:15:19 + (UTC) I'm trying to figure out why in some cases, spamd is taking in excess of 1200s to process messages. Is there any way to profile (i.e. time, or timestamp) each of the tests that spamd is doing so I can see where the longest ones are? Even enabling the kind of debug that "spamassassin -D" produces, along with timestamps for each line of debug would be useful. Somebody else posted this a while back. Do spamassassin -D < email.txt 2>&1 | timestamp where timestamp is a .function defined in .bashrc : function timestamp() { perl -MPOSIX -MTime::HiRes -n -e ' BEGIN {$|=1; $dp=0; $t0=Time::HiRes::time}; $t=Time::HiRes::time; $dt=$t-$t0; printf("%s%06.3f %4.3f %4.3f %s", POSIX::strftime("%H:%M:",localtime($t)), $t-int($t/60)*60, $dt, $dt-$dp, $_); $dp=$dt' $* } Or pipe it directly to the one liner: spamassassin -D < email.txt 2>&1 | perl -MPOSIX -jeff
Re: Spam with clean URI's which forward to DNSBListed URL (by HTML redirect header)
From: Theo Van Dinter Date: Wed, 7 Jan 2009 11:36:18 -0500 On Wed, Jan 07, 2009 at 04:46:44PM +0100, Florian Lagg wrote: > So - if possible - I want spamassassign to: > 1. Request the links in the mail body and check them for http-error 302 or > meta redirects > 2. Check the links we got by doing this against some DNSBL's > > Is this possible? Is there a reason why we shouldn't do this? You can look at the WebRedirect plugin on http://wiki.apache.org/spamassassin/CustomPlugins Possible? Sure. Should? Not unless you want to turn your (and anyone else running that code's) machine into a DDoS client. In other words, while it's possible to shoot yourself in the face, it's really not a good idea to do so. There are various WARNING: PRIVACY AND TECHNICAL ISSUES listed in the plugin. I used the plugin for a while, but stopped using it when the number of hits dropped off. -jeff
Re: sa-update damages existing SA installation
From: Marcin Krol Date: Thu, 18 Dec 2008 18:37:12 +0100 Hello everyone, When I run sa-update -D --gpgkey 6C6191E3 --channel sought.rules.yerp.org, it damages my SA installation! sa-update puts rules in /var/lib/spamassassin/ Once this directory exists all site rules are expected to come from this directory. The previous installation directory (eg /usr/local/share/spamassassin) is ignored. Try doing sa-update of the normal rules before you use sa-update of additional rule sets. ... And my SA doesn't score any mails anymore! I have to purge the existing SA (dpkg -P spamassassin), reinstall it from scratch, restore conf files from backups and then it works. WTF! Does anybody know what goes wrong? Use -D to print see which config files is being read by spamassassin: % spamassassin --lint -D 2>&1 | fgrep 'config: using' [31869] dbg: config: using "/etc/mail/spamassassin" for site rules pre files [31869] dbg: config: using "/var/lib/spamassassin/3.001007" for sys rules pre files [31869] dbg: config: using "/var/lib/spamassassin/3.001007" for default rules dir [31869] dbg: config: using "/etc/mail/spamassassin" for site rules dir [31869] dbg: config: using "/home/jeff/.spamassassin/user_prefs" for user prefs file [31869] dbg: config: using "/var/lib/spamassassin/3.001007/updates_spamassassin_org/empty.pre" for included file [31869] dbg: config: using "/var/lib/spamassassin/3.001007/updates_spamassassin_org/10_misc.cf" for included file [31869] dbg: config: using "/var/lib/spamassassin/3.001007/updates_spamassassin_org/20_advance_fee.cf" for included file -jeff
Re: White List From RCVD
From: mouss Date: Thu, 11 Dec 2008 19:55:44 +0100 Asif Iqbal a écrit : > I have this in local.cf in qmail.here.net's /etc/mail/spamassassin dir > > whitelist_from_rcvd joe.sm...@here.com qtdenexmbm24.AD.HERE.COM > > But email from that address still tagged as spam. What am I doing wrong? > you should run the message through spamassassin -D to see which relays are trusted. or you could get luck with: always_trust_envelope_sender 1 If you add a Relay header eg: add_header all Relay trusted=_RELAYSTRUSTED_, untrusted=_RELAYSUNTRUSTED_ Then you want the rdns= from the first untrusted relay. In this case it is probably: whitelist_from_rcvd joe.sm...@here.com here.com THe whitelist probably wont work for here.com because of lack of reverse dns. Received: from NO?REVERSE?DNS (HELO sudnp799.here.com) The debug output should confirm this.
RE: about fake mails
From: "Giampaolo Tomassoni" <[EMAIL PROTECTED]> Date: Sun, 7 Dec 2008 15:52:10 +0100 > -Original Message- > From: Yavuz Maslak [mailto:[EMAIL PROTECTED] > Sent: Sunday, December 07, 2008 3:02 PM > > Ok > I have started to use dkim verification. I defined whitelists in > local.cf. > it works. > But I could not find how I give high score for a spammer who doesn't > use > gmail's mail servers. > > Although a domain has domain keys, how can I give positive score for a > mail > which comes from a fake smtp server ? There is no direct way (to my knowledge) to do this. You have to apply a positive score to all mail claiming to be "From:" a gmail address, then apply a negative score voiding the first one to the DKim-verified ones. You can write a meta rule for email that claims to be from gmail that does not have DKIM. # add some penalty points to mail from yahoo and gmail.com which # does not carry a valid signature; exempt mail from mailing lists header __L_ML1 Precedence =~ m{\b(list|bulk)\b}i header __L_ML2 exists:List-Id header __L_ML3 exists:List-Post header __L_ML4 exists:Mailing-List header __L_HAS_SNDR exists:Sender meta __L_VIA_ML(__L_ML1 || __L_ML2 || __L_ML3 || __L_ML4 || __L_HAS_SNDR) header __L_FROM_Y1 From:addr =~ [EMAIL PROTECTED] header __L_FROM_Y2 From:addr =~ [EMAIL PROTECTED](ar|br|cn|hk|my|sg)$}i header __L_FROM_Y3 From:addr =~ [EMAIL PROTECTED](id|in|jp|nz|uk)$}i header __L_FROM_Y4 From:addr =~ [EMAIL PROTECTED](ca|de|dk|es|fr|gr|ie|it|pl|se)$}i meta __L_FROM_YAHOO (__L_FROM_Y1 || __L_FROM_Y2 || __L_FROM_Y3 || __L_FROM_Y4) header __L_FROM_GMAIL From:addr =~ [EMAIL PROTECTED] meta L_UNVERIFIED_YAHOO (!DKIM_VERIFIED && !DK_VERIFIED && __L_FROM_YAHOO && !__L_VIA_ML) priority L_UNVERIFIED_YAHOO 500 scoreL_UNVERIFIED_YAHOO 2.5 meta L_UNVERIFIED_GMAIL (!DKIM_VERIFIED && __L_FROM_GMAIL && !__L_VIA_ML) priority L_UNVERIFIED_GMAIL 500 scoreL_UNVERIFIED_GMAIL 2.5 I got these rules from this list. I added !DK_VERIFIED to L_UNVERIFIED_YAHOO. -jeff