Re: Spam with AWL and Bayes00
Karsten Bräckelmann wrote: On Tue, 2009-03-10 at 10:05 -0500, Chris Barnes wrote: Karsten Bräckelmann wrote: The AWL score for this message is minimal (one can tell by calculating the stock rules' scores without it). Your problem here is BAYES_00 and RCVD_IN_DNSWL_MED. BAYES_00 means your Bayes DB is pretty skewed. You should train sa-learn on these messages. I do. Daily. Then it should be scoring like BAYES_50 at worst... Note, I train on my personal account. But is there also a system-wide Bayes db that might be causing this score? You tell us. We didn't set up your system. Where do I look? In either case, you must be training as the user running SA, doing the scanning and using Bayes. Check your Bayes DB values by running the command $ sa-learn --dump magic and keep an eye on the values (in particular nspam, nham and ntokens) before and after training. Also ensure it is the scanning user. Sure appears to be doing it as the user: cbar...@vmmail:~$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0144 0 non-token data: nspam 0.000 0323 0 non-token data: nham 0.000 0 41368 0 non-token data: ntokens 0.000 0 926982545 0 non-token data: oldest atime 0.000 0 1236700269 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count cbar...@vmmail:~$ sa-learn --spam --progress Maildir/.Spam/cur 100% [=] 0.75 msgs/sec 00m29s DONE Learned tokens from 22 message(s) (22 message(s) examined) cbar...@vmmail:~$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0166 0 non-token data: nspam 0.000 0323 0 non-token data: nham 0.000 0 42929 0 non-token data: ntokens 0.000 0 926982545 0 non-token data: oldest atime 0.000 0 1236962185 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count Received: from tr-2-int.cis.tamu.edu (tamu-relay.tamu.edu [165.91.22.121]) by mail.physics.tamu.edu (Postfix) with ESMTP id 2D8B8950C1 for cbar...@mail.physics.tamu.edu; Tue, 10 Mar 2009 01:22:52 -0500 (CDT) Listed in DNSWL MED. Appears trustworthy and internal. Should not have been checked here, but instead be part of your trusted_networks. It is internal (well, to our organization, but not to my dept). Received: from localhost (localhost.tamu.edu [127.0.0.1]) by tr-2-int.cis.tamu.edu (Postfix) with ESMTP id DF2CA1FD92 for chris-bar...@tamu.edu; Tue, 10 Mar 2009 01:22:51 -0500(CDT) *boggle* boggle? this host is the main host at our university. I suspect this is where the message is being passed to amavisd-new for virus scanning. This is not a server I have any access to whatsoever... X-Virus-Scanned: amavisd-new at tamu.edu X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from Outbound-four.nuos.com (outbound-four.nuos.com [63.149.233.44]) by tr-2-int.cis.tamu.edu (Postfix) with SMTP id 37F521FD65 for chris-bar...@tamu.edu; Tue, 10 Mar 2009 01:22:50 -0500 (CDT) NOT listed at dnswl.org. Looks like it is about option (a), and your trusted and internal networks setting is borked. There was no setting for trusted_networks or internal networks. If I add the following to our local.cf, will this prevent the DNSWL_MED from being used? - - - - proposed local.cf addition - - - - # Set which networks or hosts are considered 'trusted' by your mail # server (i.e. not spammers) # trusted_networks 165.91. 128.194. - - - - proposed local.cf addition - - - - Any chance you are getting a hit on RCVD_IN_DNSWL_MED for *any* mail? That's a whopping -4 offset, and renders most of the positive scoring RBL network tests useless. I looked in a message that never went outside of our local network. It generated a RCVD_IN_DNSWL_MED value as well. Does the following NON-spam header help? - - - header of a NON spam message that never left our domain - - - Return-Path: eta...@physics.tamu.edu X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on vmmail.physics.tamu.edu X-Spam-Level: X-Spam-Status: No, score=-6.6
does whitelist_from_spf match SPF_HELO_PASS?
I've got a false-positive against TVD_PH_REC. The text in part says: THIS REPORT MAY NOT REFLECT THE INFORMATION REGARDING YOUR ACCOUNT FOUND ON THE OFFICIAL RECORDS OF Rather than fiddle with TVD_PH_REC, I'd like to whitelist this sender using SPF. However, it appears that the envelope from address does not have an SPF policy; however, the helo record does match: X-Spam-Status: Yes, score=6.46 tag=-99 tag2=4.5 kill=6.31 tests=[L_P0F_UNKN=0.8, RELAY_US=0.01, SPF_HELO_PASS=-0.001, SUBJ_ALL_CAPS=1, TVD_PH_REC=2.996, UPPERCASE_50_75=0.49, US_DOLLARS_3=1.165] If I add a whitelist_from_spf record for this correspondent, will it work? The message is sent from someu...@subdomain.example.com while the helo address is differentdomain.example.com. In this case, example.com and differentdomain.example.com both have valid, matching spf records, but subdomain.example.com does not have any spf record. -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com signature.asc Description: This is a digitally signed message part
Re: SPF_NEUTRAL scoring?
LuKreme wrote: I don't remember what ?all means though, or how it differs from -all or ~all. ? means the record makes no claims about that source. ?all basically says, Mail might come from other places, or it might not, we aren't sure. (In RFC terms, mail from us MAY be sent from other places not listed.) - means mail should *never* come from that source, so -all means Only the sources listed here will send you mail; anything from anywhere else is definitely forged. (In RFC terms, mail from us MUST NOT be sent from other places.) ~ is (IIRC) specific to all, and ~all means Other places shouldn't be sending you mail, but we're not 100% certain we haven't missed something. (In RFC terms, mail from us SHOULD NOT be sent from other places.) -- Kelson Vibber SpeedGate Communications www.speed.net
spamassasin: sa-learn --dump magic intrepretation
Is there a document regarding the interpretation of sa-learn --dump magic config: could not find site rules directory 0.000 03 0 non-token data: bayes db version 0.000 0 261451 0 non-token data: nspam 0.000 018530 0 non-token data: nham 0.000 0 143599 0 non-token data: ntokens 0.000 0 1231533845 0 non-token data: oldest atime 0.000 0 1237223892 0 non-token data: newest atime 0.000 0 1237214668 0 non-token data: last journal sync atime 0.000 0 1237059740 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 9311 0 non-token data: last expire reduction count
Re: does whitelist_from_spf match SPF_HELO_PASS?
On Mon, 16 Mar 2009 11:54:26 -0500 McDonald, Dan dan.mcdon...@austinenergy.com wrote: I've got a false-positive against TVD_PH_REC. The text in part says: THIS REPORT MAY NOT REFLECT THE INFORMATION REGARDING YOUR ACCOUNT FOUND ON THE OFFICIAL RECORDS OF Rather than fiddle with TVD_PH_REC, I'd like to whitelist this sender using SPF. However, it appears that the envelope from address does not have an SPF policy; however, the helo record does match: X-Spam-Status: Yes, score=6.46 tag=-99 tag2=4.5 kill=6.31 tests=[L_P0F_UNKN=0.8, RELAY_US=0.01, SPF_HELO_PASS=-0.001, SUBJ_ALL_CAPS=1, TVD_PH_REC=2.996, UPPERCASE_50_75=0.49, US_DOLLARS_3=1.165] If I add a whitelist_from_spf record for this correspondent, will it work? I don't believe so. You might try whitelist_from_rcvd if you have reverse dns on the last-hop. I'd also suggest turning on BAYES, if you want to avoid more FPs.
Re: spamassasin: sa-learn --dump magic intrepretation
Is there a document regarding the interpretation of sa-learn --dump magic config: could not find site rules directory 0.000 03 0 non-token data: bayes db version 0.000 0 261451 0 non-token data: nspam 0.000 018530 0 non-token data: nham 0.000 0 143599 0 non-token data: ntokens 0.000 0 1231533845 0 non-token data: oldest atime 0.000 0 1237223892 0 non-token data: newest atime 0.000 0 1237214668 0 non-token data: last journal sync atime 0.000 0 1237059740 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 9311 0 non-token data: last expire reduction count Let me take a stab at it. The db version is 3 You have 261,451 tokens that appeared in spam¹. You have 18,530 tokens that appeard in ham¹ You have 143,599 tokens (remember, some tokens could appear in both spam and ham) The oldest token is date -j -f %s 1231533845 Fri Jan 9 15:44:05 EST 2009 The newest token is date -j -f %s 1237223892 Mon Mar 16 13:18:12 EDT 2009 The rest should be easy to figure out. -- Michael Scheidell, CTO |SECNAP Network Security Finalist 2009 Network Products Guide Hot Companies FreeBSD SpamAssassin Ports maintainer _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ _
Re: spamassasin: sa-learn --dump magic intrepretation
On Mon, 16 Mar 2009 13:23:22 -0400 Dennis German dger...@real-world-systems.com wrote: Is there a document regarding the interpretation of sa-learn --dump magic The are pretty self-explanatory, if you know roughly how Bayes works. The first three are the number of hams and spams learned and the total number of tokens in the database. One of them is the time the journal was last synched with the bayes database. The rest are concerned with the automatic expiry of tokens from the database, to prevent it growing indefinitely. Each token has a timestamp which is set from the headers when learned and updated when it contributes to the Bayesian probability in a test. This timestamp is used to age-out the less useful tokens. There's a detailed description in the sa-learn manpage in the EXPIRATION section.
Re: does whitelist_from_spf match SPF_HELO_PASS?
On Mon, 2009-03-16 at 17:38 +, RW wrote: On Mon, 16 Mar 2009 11:54:26 -0500 McDonald, Dan dan.mcdon...@austinenergy.com wrote: Rather than fiddle with TVD_PH_REC, I'd like to whitelist this sender using SPF. However, it appears that the envelope from address does not have an SPF policy; however, the helo record does match: If I add a whitelist_from_spf record for this correspondent, will it work? I don't believe so. You might try whitelist_from_rcvd if you have reverse dns on the last-hop. Unfortunately, it is sent from a large pool of servers, so whitelist_from_rcvd is not a very good choice. The company in question has two /16's in the SPF record for their bare domain name. I've sent an e-mail off to hostmas...@... to ask them to add the SPF record for this particular subdomain. I'd also suggest turning on BAYES, if you want to avoid more FPs. It would be very hard to get ham samples reflective of the whole company. It also tends to make mail delivery less deterministic, so I normally turn off BAYES and AWL. -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com signature.asc Description: This is a digitally signed message part
Re: spamassasin: sa-learn --dump magic intrepretation
On Mon, 16 Mar 2009 14:03:47 -0400 Michael Scheidell scheid...@secnap.net wrote: You have 261,451 tokens that appeared in Œspam¹. You have 18,530 tokens that appeard in Œham¹ You have 143,599 tokens (remember, some tokens could appear in both spam and ham) The first two are actually the total number of spam and ham emails learned. Most of the tokens have since expired.
Re: does whitelist_from_spf match SPF_HELO_PASS?
On Mon, 16 Mar 2009 11:54:26 -0500 McDonald, Dan dan.mcdon...@austinenergy.com wrote: Rather than fiddle with TVD_PH_REC, I'd like to whitelist this sender using SPF. However, it appears that the envelope from address does not have an SPF policy; however, the helo record does match: If I add a whitelist_from_spf record for this correspondent, will it work? On Mon, 2009-03-16 at 17:38 +, RW wrote: I don't believe so. You might try whitelist_from_rcvd if you have reverse dns on the last-hop. On 16.03.09 13:50, McDonald, Dan wrote: Unfortunately, it is sent from a large pool of servers, so whitelist_from_rcvd is not a very good choice. The company in question has two /16's in the SPF record for their bare domain name. do those hosts have too much of hostnames? whitelist_from_rcvd understands domain component, e.g. yahoo.com. I've sent an e-mail off to hostmas...@... to ask them to add the SPF record for this particular subdomain. good. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. - Have you got anything without Spam in it? - Well, there's Spam egg sausage and Spam, that's not got much Spam in it.
Re: spamassasin: sa-learn --dump magic intrepretation
On 16.03.09 13:23, Dennis German wrote: Is there a document regarding the interpretation of sa-learn --dump magic config: could not find site rules directory 0.000 03 0 non-token data: bayes db version 0.000 0 261451 0 non-token data: nspam 0.000 018530 0 non-token data: nham Ohh, that's way too much of spam I'd say. Don't you have much of FPs ? -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux - It's now safe to turn on your computer. Linux - Teraz mozete pocitac bez obav zapnut.
Re: spamassasin: sa-learn --dump magic intrepretation
Michael Scheidell wrote: Is there a document regarding the interpretation of sa-learn --dump magic config: could not find site rules directory 0.000 03 0 non-token data: bayes db version 0.000 0 261451 0 non-token data: nspam 0.000 018530 0 non-token data: nham 0.000 0 143599 0 non-token data: ntokens 0.000 0 1231533845 0 non-token data: oldest atime 0.000 0 1237223892 0 non-token data: newest atime 0.000 0 1237214668 0 non-token data: last journal sync atime 0.000 0 1237059740 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 9311 0 non-token data: last expire reduction count Let me take a stab at it. The db version is 3 You have 261,451 tokens that appeared in ‘spam’. You have 18,530 tokens that appeard in ‘ham’ Actually, nspam and nham count messages, not tokens. They're also a count of the total training, and don't go down as tokens expire out. You have 143,599 tokens (remember, some tokens could appear in both spam and ham) Yes, and also you need to account for SA expiring out tokens, and tokens that occur in multiple messages. (ie: it's not strange that your message count is higher than your token count). The oldest token is date -j -f %s 1231533845 Fri Jan 9 15:44:05 EST 2009 The newest token is date -j -f %s 1237223892 Mon Mar 16 13:18:12 EDT 2009 The rest should be easy to figure out. -- Michael Scheidell, CTO |SECNAP Network Security Finalist 2009 Network Products Guide Hot Companies FreeBSD SpamAssassin Ports maintainer This email has been scanned and certified safe by SpammerTrap®. For Information please see www.secnap.com/products/spammertrap/ http://www.secnap.com/products/spammertrap/
Preview with guessed encoding
What is the recommended way to get utf8 content of spam message in cases when: 1) spam message misses charset declaration (common for TW spam) 2) TextCat Plugin detects language *and charset* In case of one specific spam: * TextCat detects zh.big5 * $status-get_content_preview() return bushes and (us ascii) http links * Encode::decode('big5',$status-get_content_preview()) return something auto-translators can translate into making sense English but the http links are missing -- [plen: Andrew] Andrzej Adam Filip : a...@onet.eu Adam and Eve had many advantages, but the principal one was, that they escaped teething. -- Mark Twain, Pudd'nhead Wilson's Calendar
Re: does whitelist_from_spf match SPF_HELO_PASS?
SPF_HELO_PASS is NOT considered by whitelist_from_spf. Daryl
HABEAS_ACCREDITED_COI
Received a mail in my inbox today that was definitely spam but scored as below. After running it through spamassassin -r and -t and removing the senders address from the autowhitelist I got it to score X-spam-status: No, score=-0.1 required=5.0 tests=ADVANCE_FEE_2=1.234, BAYES_50=1,DCC_CHECK_NEGATIVE=-0.0001,HABEAS_ACCREDITED_COI=-8, SARE_FRAUD_X3=1.667,SARE_FRAUD_X4=1.667,SARE_FRAUD_X5=1.667,US_DOLLARS_3=0.63 Content analysis details: (7.0 points, 5.0 required) pts rule name description -- -- -8.0 HABEAS_ACCREDITED_COI RBL: Habeas Accredited Confirmed Opt-In or Better [208.82.16.109 listed in sa-accredit.habeas.com] 5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.] 0.6 US_DOLLARS_3 BODY: Mentions millions of $ ($NN,NNN,NNN.NN) 2.2 DCC_CHECK listed in DCC (http://rhyolite.com/anti-spam/dcc/) [localhost 1117; Body=1 Fuz1=many] [Fuz2=many] 1.2 ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian 419) 1.7 SARE_FRAUD_X5 Matches 5+ phrases commonly used in fraud spam 1.7 SARE_FRAUD_X3 Matches 3+ phrases commonly used in fraud spam 1.7 SARE_FRAUD_X4 Matches 4+ phrases commonly used in fraud spam 1.0 SAGREY Adds 1.0 to spam from first-time senders I read the HABEAS score as meaning ReturnPath thinks its a good sender? Is there any action that should be taken such as reporting this to them? -- KeyID 0xE372A7DA98E6705C signature.asc Description: This is a digitally signed message part
Re: HABEAS_ACCREDITED_COI
The wiki now has an email address to report Habeas-accredited spam: http://wiki.apache.org/spamassassin/Rules/HABEAS_ACCREDITED_COI pgp8bfg8GvsBB.pgp Description: PGP signature
Re: HABEAS_ACCREDITED_COI
On Mon, 2009-03-16 at 19:46 -0400, Greg Troxel wrote: The wiki now has an email address to report Habeas-accredited spam: http://wiki.apache.org/spamassassin/Rules/HABEAS_ACCREDITED_COI Thanks Greg, I've reported it to them -- KeyID 0xE372A7DA98E6705C signature.asc Description: This is a digitally signed message part
Re: spamassasin: sa-learn --dump magic interpretation good/bad/other?
0) Michael, thanks 1) what are the various zero columns?? for example in 0.000 0 3 0 non-token data: bayes db version 2) Is this good? not too good? bad? trouble? On Mar 16, 2009, at 14:03, Michael Scheidell wrote: Is there a document regarding the interpretation of sa-learn --dump magic config: could not find site rules directory 0.000 03 0 non-token data: bayes db version 0.000 0 261451 0 non-token data: nspam 0.000 018530 0 non-token data: nham 0.000 0 143599 0 non-token data: ntokens 0.000 0 1231533845 0 non-token data: oldest atime 0.000 0 1237223892 0 non-token data: newest atime 0.000 0 1237214668 0 non-token data: last journal sync atime 0.000 0 1237059740 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 9311 0 non-token data: last expire reduction count The db version is 3 You have 261,451 tokens that appeared in ‘spam’. You have 18,530 tokens that appeard in ‘ham’ You have 143,599 tokens (remember, some tokens could appear in both spam and ham) The oldest token is date -j -f %s 1231533845 Fri Jan 9 15:44:05 EST 2009 The newest token is date -j -f %s 1237223892 Mon Mar 16 13:18:12 EDT 2009
JoeJobbed - Vbounce plugin - SPF?.
Hello everyone, I'm running Spamassassin 3.1.7, with netqmail 1.05, ClamAv etc.. We've been subject to being joe-jobbed on one of our domains here at work. We were lucky as we were able to switch off delivery to the affected domain and effectively blocked the blowback by refusing E-Mail from all the Postmasters around the world sending NDR's and so forth to the now non-existent mailboxes. However, This was a far-from-optimal solution, as I'm sure many people will be wanting to point out already, what if we needed that domain to still receipt legitimate E-Mail... We initially tried 'riding out the storm' as it were, but were unable to keep on top of the load put on the servers by excessive E-Mail messages requiring scanning by SA. This got so bad that the mailserver had become unresponsive to our clients. I removed a bunch of our own site rules (which were going to be whittled away anyhow) to decrease the average scantime of E-Mails by Spamassassin - this did work, for about 15 minutes. Then, an average scantime of 4 seconds was not good enough - clients still denied SMTP (too busy). I decided (wrongly) to implement the Vbounce plugin. Read the install doc, got it setup, tested SA with debug and lint, everything appeared to test OK. Put it into practice by reloading SA and then Wang! Average scantimes hit the roof: 38 seconds. Needless to say I disabled the plugin. Although whilst it was running, it did appear to be doing the job correctly according to my mail logs - and there were no errors. So we blocked the domain. I am interested to know the following: Has anyone else had this kind of result when installing the Vbounce plugin? (largely increased scantimes) How might I keep delivery flowing to valid recipients for the domain (smarthosted (smtproutes) to exchange) but reject the blowback at SMTP time? I was considering convincing the powers to let me setup SPF, but their requirement would be to have both v1 and v2 spf tags - and I'm not sure whether Q-Mail is up to both yet, but some kind of SPF implementation where we check the tags (not necessarily publish them) but I guess that's an MTA question:) Thanks in advance for any useful information :) Cheers, Michael Hutchinson
Re: HABEAS_ACCREDITED_COI
On 16-Mar-2009, at 16:40, Chris wrote: -8.0 HABEAS_ACCREDITED_COI RBL: Habeas Accredited Confirmed Opt-In or Better [208.82.16.109 listed in I changed my HABEAS scores ages ago: score HABEAS_ACCREDITED_COI -1.0 score HABEAS_ACCREDITED_SOI -0.5 score HABEAS_CHECKED 0 I'm seriously considering changing them to 1.0, 0.01, and 0, respectively. I seem to ONLY see the headers in spam messages. It's a shame the defaults in SA are still set absurd values. -- Major Strasser has been shot. Round up the usual suspects.