Negative score spamassassin
Hello and sorry for my english. I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have pyzor ou razor. Mailscanner is only a gateway for my exchange 2010 In Spamassassin, i have really very bad score or negative score, for example the last emails and score from spamassassin : -1,24 -1,72 -2.90 -2,47 -1,22 example of mail not considere as spam but it is spam ! : cached not score=3.722 5 requis 0.80 BAYES_50 Bayes spam probability is 40 to 60% 0.00 FREEMAIL_FROM Sender email is freemail 1.93 FREEMAIL_REPLY From and body contain different freemails 0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 0.00 HTML_MESSAGE HTML included in message 0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars Maybe there is a problem of configuration because all of my emails come from the same IP. From internet, email send to my domain is receive from my provider and then, the provider relay mails to my mailscanner 's server. about this, maybe spamassassin can t do his job ? how to configure spamassassin for this ? -- View this message in context: http://old.nabble.com/Negative-score-spamassassin-tp32870220p32870220.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Negative score spamassassin
Hello and sorry for my english. I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have pyzor ou razor. Mailscanner is only a gateway for my exchange 2010 In Spamassassin, i have really very bad score or negative score, for example the last emails and score from spamassassin : -1,24 -1,72 -2.90 -2,47 -1,22 example of mail not considere as spam but it is spam ! : cached not score=3.722 5 requis 0.80 BAYES_50 Bayes spam probability is 40 to 60% 0.00 FREEMAIL_FROM Sender email is freemail 1.93 FREEMAIL_REPLY From and body contain different freemails 0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 0.00 HTML_MESSAGE HTML included in message 0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars Maybe there is a problem of configuration because all of my emails come from the same IP. From internet, email send to my domain is receive from my provider and then, the provider relay mails to my mailscanner 's server. about this, maybe spamassassin can t do his job ? how to configure spamassassin for this ? -- View this message in context: http://old.nabble.com/Negative-score-spamassassin-tp32870222p32870222.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Negative score spamassassin
Hello and sorry for my english. I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have pyzor ou razor. Mailscanner is only a gateway for my exchange 2010 In Spamassassin, i have really very bad score or negative score, for example the last emails and score from spamassassin : -1,24 -1,72 -2.90 -2,47 -1,22 example of mail not considere as spam but it is spam ! : cached not score=3.722 5 requis 0.80 BAYES_50 Bayes spam probability is 40 to 60% 0.00 FREEMAIL_FROM Sender email is freemail 1.93 FREEMAIL_REPLY From and body contain different freemails 0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 0.00 HTML_MESSAGE HTML included in message 0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars Maybe there is a problem of configuration because all of my emails come from the same IP. From internet, email send to my domain is receive from my provider and then, the provider relay mails to my mailscanner 's server. about this, maybe spamassassin can t do his job ? how to configure spamassassin for this ? -- View this message in context: http://old.nabble.com/Negative-score-spamassassin-tp32870223p32870223.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Negative score spamassassin
need to see the rule hits for the negative scores.. also I don't see any RBL, URIBL, pyzor or razor scores in there, have you disabled network tests? these are really valuable - just make sure you only choose a couple of the RBL's (see http://wiki.mailscanner.info/doku.php?id=maq:index#getting_the_best_out_of_spamassassinfor some ideas - it's a little outdated but still usefull I think) -- Martin Hepworth Oxford, UK On 21 November 2011 08:26, ercibrest eric.le-co...@sopab.fr wrote: Hello and sorry for my english. I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have pyzor ou razor. Mailscanner is only a gateway for my exchange 2010 In Spamassassin, i have really very bad score or negative score, for example the last emails and score from spamassassin : -1,24 -1,72 -2.90 -2,47 -1,22 example of mail not considere as spam but it is spam ! : cached not score=3.722 5 requis 0.80 BAYES_50 Bayes spam probability is 40 to 60% 0.00 FREEMAIL_FROM Sender email is freemail 1.93 FREEMAIL_REPLY From and body contain different freemails 0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 0.00 HTML_MESSAGE HTML included in message 0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars Maybe there is a problem of configuration because all of my emails come from the same IP. From internet, email send to my domain is receive from my provider and then, the provider relay mails to my mailscanner 's server. about this, maybe spamassassin can t do his job ? how to configure spamassassin for this ? -- View this message in context: http://old.nabble.com/Negative-score-spamassassin-tp32870223p32870223.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES
Hi, Was wondering if could have some advice, and I probably know what I'm going to do anyway, just wanted a few others opinions.. I've been analysing a load of mail which is having it's SA score reduced by what looks like paid for whitelists. A view of the SA scores I'm seeing is: RuleTotal Ham % Spam% RP_MATCHES_RCVD 161,165 142,559 88.518,606 11.5 RCVD_IN_RP_SAFE 22,405 22,399 100 6 0 RCVD_IN_RP_CERTIFIED22,130 22,125 100 5 0 RCVD_IN_RP_RNBL 12,794 43 0.3 12,751 99.7 T_RP_MATCHES_RCVD 7,080 5,072 71.62,008 28.4 Now looking at virtualls ALL of these they look like SPAM. Now the scores for this GASH are as follows: RP_MATCHES_RCVD -2.023 -1.201 -2.023 -1.201 RCVD_IN_RP_SAFE 0.0 -2.0 0.0 -2.0 RCVD_IN_RP_CERTIFIED 0.0 -3.0 0.0 -3.0 RCVD_IN_RP_RNBL 0 1.284 0 1.31 For some reason I can't find any scores for T_RP_MATCHES_RCVD. Am I being dumn here? Does the T_ mean something I don't know? So anyway, what I recon I should do is get rid of all the negative scores for these Rules, as looking at the scores above, they are all suspicious, and looking at the actual mails, they are pretty dodgy. Has anyone else seen this or got any advice on this matter? Should we be trusting a paid for whitelist? I also saw something about fake RP headers? Could this be the case? Thanks Pip (Apologies have posted same to mailing list but thought I'd try a 2 pronged approach!) -- View this message in context: http://old.nabble.com/Return-Path-Whitelists%2C-RP_SAFE%2C-RP_CERTIFIED%2C-RP_MATCHES%E2%80%8F-tp32870476p32870476.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES
On Mon, 21 Nov 2011 03:11:48 -0800 (PST), pipjg wrote: Has anyone else seen this or got any advice on this matter? Should we be trusting a paid for whitelist? where do you pay ? why not report spam to returnpath ? but feel free to set scores to zero, if you like to pay :-)
Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES
On Mon, 21 Nov 2011 03:11:48 -0800 (PST) pipjg wrote: Hi, Was wondering if could have some advice, and I probably know what I'm going to do anyway, just wanted a few others opinions.. I've been analysing a load of mail which is having it's SA score reduced by what looks like paid for whitelists. A view of the SA scores I'm seeing is: Rule Total Ham % Spam% RP_MATCHES_RCVD 161,165 142,559 88.5 18,60611.5 RCVD_IN_RP_SAFE22,405 22,399 100 6 0 RCVD_IN_RP_CERTIFIED 22,130 22,125 100 5 0 RCVD_IN_RP_RNBL 12,794 43 0.3 12,751 99.7 T_RP_MATCHES_RCVD 7,080 5,072 71.6 2,008 28.4 Now looking at virtualls ALL of these they look like SPAM. No they don't, you haven't read your own results correctly. RCVD_IN_RP_SAFE and RCVD_IN_RP_CERTIFIED are ~100% Ham. RCVD_IN_RP_RNBL is a blacklist rule, so it's supposed to hit spam. [T_]RP_MATCHES_RCVD are not ReturnPath whitelist rules: describe RP_MATCHES_RCVD Envelope sender domain matches handover relay domain Everything related to ReturnPath.net/senderscore is working remarkably well for you. For some reason I can't find any scores for T_RP_MATCHES_RCVD. Am I being dumn here? Does the T_ mean something I don't know? T_* rules are under test, so it's an earlier name for RP_MATCHES_RCVD.
Re: Help with constructing a rule for MCP
On 11/20/2011 10:02 PM, Sergio wrote: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]+\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i These will match any domain that starts with dh and ends with .com. For example, they will match someu...@dhalailama.com. Is this expected? If you just want to match a single character, then get rid of the +. header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]\.com/i -- Bowie
Re: Negative score spamassassin
On 11/21, ercibrest wrote: Maybe there is a problem of configuration because all of my emails come from the same IP. From internet, email send to my domain is receive from my provider and then, the provider relay mails to my mailscanner 's server. Add that IP to your trusted_networks setting, documented in the spamassassin man page: http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#network_test_options Also some info here: http://wiki.apache.org/spamassassin/TrustPath -- It's never too late to panic. http://www.ChaosReigns.com
Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES
On 11/21, pipjg wrote: dumn here? Does the T_ mean something I don't know? Yes, it means there is a bug in the way spamassassin rules are being published. It stands for testing. rules with a T_ prefix to their names are never published - http://wiki.apache.org/spamassassin/SaUpdateBackend This is the first google hit for: spamassassin t_ Although I don't currently see T_RP_MATCHES_RCVD in my rules. Run sa-update again (you run it daily from cron, right?), check to see if it's still there, and if it is, open a bug: https://issues.apache.org/SpamAssassin/ Rules that don't have a score defined have a default score of 1, or, in this case, -1, because it has the nice flag set (it's intended to hit ham, not spam). -- A ship in a port is safe, but that's not what ships are built for. -Grace Murray Hopper http://www.ChaosReigns.com
Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES
On 11/21/2011 10:53 AM, dar...@chaosreigns.com wrote: On 11/21, pipjg wrote: dumn here? Does the T_ mean something I don't know? Yes, it means there is a bug in the way spamassassin rules are being published. It stands for testing. rules with a T_ prefix to their names are never published - http://wiki.apache.org/spamassassin/SaUpdateBackend This is the first google hit for: spamassassin t_ Although I don't currently see T_RP_MATCHES_RCVD in my rules. Run sa-update again (you run it daily from cron, right?), check to see if it's still there, and if it is, open a bug: https://issues.apache.org/SpamAssassin/ Rules that don't have a score defined have a default score of 1, or, in this case, -1, because it has the nice flag set (it's intended to hit ham, not spam). Except for T_ rules -- they have a default score of 0.01. -- Bowie
Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES
On Mon, 21 Nov 2011 13:50:05 + RW wrote: On Mon, 21 Nov 2011 03:11:48 -0800 (PST) pipjg wrote: RuleTotal Ham % Spam% RP_MATCHES_RCVD 161,165 142,559 88.5 18,606 11.5 RCVD_IN_RP_SAFE22,405 22,399 describe RP_MATCHES_RCVD Envelope sender domain matches handover relay domain Actually, now I come to think about it I had a problem RP_MATCHES_RCVD, and I wasn't the only one: http://old.nabble.com/RP_MATCHES_RCVD-to32157087.html
Re: Help with constructing a rule for MCP
On Mon, 21 Nov 2011, Bowie Bailey wrote: On 11/20/2011 10:02 PM, Sergio wrote: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]+\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i These will match any domain that starts with dh and ends with .com. You overlooked the l. For example, they will match someu...@dhalailama.com. Is this expected? It won't. If you just want to match a single character, then get rid of the +. It's to match -usa or other dhl domain name variants. The line wrap in email makes that look like a single character RE. The actual RE I suggested is: /envelope-from [^ @]+@dhl[^ .]+\.com/i It also won't match dhl.com. My bad. As I said, it was off the top of my head. These might be better: /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i /\bdhl(?:[-_][^ .]+)?\.com/i -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Mine eyes have seen the horror of the voting of the horde; They've looted the fromagerie where guv'ment cheese is stored; If war's not won before the break they grow so quickly bored; Their vote counts as much as yours. -- Tam --- 348 days since the first successful private orbital launch (SpaceX)
Re: Help with constructing a rule for MCP
On 11/21/2011 11:35 AM, John Hardin wrote: On Mon, 21 Nov 2011, Bowie Bailey wrote: On 11/20/2011 10:02 PM, Sergio wrote: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]+\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i These will match any domain that starts with dh and ends with .com. You overlooked the l. Hmm... Guess I did... For example, they will match someu...@dhalailama.com. Is this expected? It won't. If you just want to match a single character, then get rid of the +. It's to match -usa or other dhl domain name variants. The line wrap in email makes that look like a single character RE. The actual RE I suggested is: /envelope-from [^ @]+@dhl[^ .]+\.com/i The line wrap wasn't an issue. I just didn't see the l. And with this font, I think I see why I didn't see it the first time. It blends in with the square bracket. It also won't match dhl.com. My bad. As I said, it was off the top of my head. These might be better: /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i /\bdhl(?:[-_][^ .]+)?\.com/i Do the @ characters need to be escaped? In a normal Perl RE they would, but I'm not sure if SA is treating them any differently since it is reading them in from a config file. -- Bowie
Re: Detecting serious domains
Hello Marc, Am 2011-11-17 07:27:51, hacktest Du folgendes herunter: determine if it's spam or ham in itself. Yahoo is a serious domain and there's lost of spam. Serious domains should not be blacklisted Ehm? I block @yahoo.com on SMTP level (on my corporated Server), because if I remove the BLOCK, I would get every day around 20-80.000 Spams from Yahoo. for example. We could also look for consistency. Bad RDNS from a serious domain might be a spam indicator. Right Also - thinking we should slowly mine the whois database and provide some sort of DNS based lookup of whois information to be able to determine the registrar of a domain, the domain age, or other info that would be useful in determining that the domain is serious or not. 1+ At least for the Domain Age! Who thinks I'm onto something? Thanks, Greetings and nice Day/Evening Michelle Konzack -- # Debian GNU/Linux Consultant ## Development of Intranet and Embedded Systems with Debian GNU/Linux Internet Service Provider, Cloud Computing http://www.itsystems.tamay-dogan.net/ itsystems@tdnet Jabber linux4miche...@jabber.ccc.de Owner Michelle Konzack Gewerbe Strasse 3 Tel office: +49-176-86004575 77694 Kehl Tel mobil: +49-177-9351947 Germany Tel mobil: +33-6-61925193 (France) USt-ID: DE 278 049 239 Linux-User #280138 with the Linux Counter, http://counter.li.org/ signature.pgp Description: Digital signature
Re: Detecting serious domains
Hello Kevin A. McGrail, Am 2011-11-17 10:56:52, hacktest Du folgendes herunter: For example, I've seen .info domains used a lot by spammers. I'm sure there is a patter there with a registrar probably. Here I can say, the DOT INFO spam is nearly 60%. Thanks, Greetings and nice Day/Evening Michelle Konzack -- # Debian GNU/Linux Consultant ## Development of Intranet and Embedded Systems with Debian GNU/Linux Internet Service Provider, Cloud Computing http://www.itsystems.tamay-dogan.net/ itsystems@tdnet Jabber linux4miche...@jabber.ccc.de Owner Michelle Konzack Gewerbe Strasse 3 Tel office: +49-176-86004575 77694 Kehl Tel mobil: +49-177-9351947 Germany Tel mobil: +33-6-61925193 (France) USt-ID: DE 278 049 239 Linux-User #280138 with the Linux Counter, http://counter.li.org/ signature.pgp Description: Digital signature
Fwd: Help with constructing a rule for MCP
Unfortunately, it seems that MCP doesn't like the rule: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i header __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ @]+@ups\.com/i header __FROM_UPSFrom =~ /\bups\.com/i metaDHL_UPS_MISMATCH(__ENV_FROM_DHL __FROM_UPS) || (__ENV_FROM_UPS __FROM_DHL) describe DHL_UPS_MISMATCHvirus DHL-USA or UPS score DHL_UPS_MISMATCH11 When I wrote this to the MPC rules file, none of my other rules work. Regards, Sergio On Mon, Nov 21, 2011 at 10:55 AM, Bowie Bailey bowie_bai...@buc.com wrote: On 11/21/2011 11:35 AM, John Hardin wrote: On Mon, 21 Nov 2011, Bowie Bailey wrote: On 11/20/2011 10:02 PM, Sergio wrote: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]+\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i These will match any domain that starts with dh and ends with .com. You overlooked the l. Hmm... Guess I did... For example, they will match someu...@dhalailama.com. Is this expected? It won't. If you just want to match a single character, then get rid of the +. It's to match -usa or other dhl domain name variants. The line wrap in email makes that look like a single character RE. The actual RE I suggested is: /envelope-from [^ @]+@dhl[^ .]+\.com/i The line wrap wasn't an issue. I just didn't see the l. And with this font, I think I see why I didn't see it the first time. It blends in with the square bracket. It also won't match dhl.com. My bad. As I said, it was off the top of my head. These might be better: /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i /\bdhl(?:[-_][^ .]+)?\.com/i Do the @ characters need to be escaped? In a normal Perl RE they would, but I'm not sure if SA is treating them any differently since it is reading them in from a config file. -- Bowie
Re: Detecting serious domains
Hello dar...@chaosreigns.com, Am 2011-11-17 12:29:41, hacktest Du folgendes herunter: There could be a useful correlation there, but I need to point out that if a domain has no MX records, the correct thing to do is to send email to the A record for the domain, and I've seen legit domains configured that way and unwilling to change. It's not even a violation of RFC. Right, but MOST spamers act like this AND there IP does not respond to SMTP requests. So, why waysting time and resource? Thanks, Greetings and nice Day/Evening Michelle Konzack -- # Debian GNU/Linux Consultant ## Development of Intranet and Embedded Systems with Debian GNU/Linux Internet Service Provider, Cloud Computing http://www.itsystems.tamay-dogan.net/ itsystems@tdnet Jabber linux4miche...@jabber.ccc.de Owner Michelle Konzack Gewerbe Strasse 3 Tel office: +49-176-86004575 77694 Kehl Tel mobil: +49-177-9351947 Germany Tel mobil: +33-6-61925193 (France) USt-ID: DE 278 049 239 Linux-User #280138 with the Linux Counter, http://counter.li.org/ signature.pgp Description: Digital signature
Re: Fwd: Help with constructing a rule for MCP
Did you try to monitor the log looking if the rule was detected? El 21/11/2011 02:00 p.m., Sergio escribió: Unfortunately, it seems that MCP doesn't like the rule: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i header __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ @]+@ups\.com/i header __FROM_UPSFrom =~ /\bups\.com/i metaDHL_UPS_MISMATCH(__ENV_FROM_DHL __FROM_UPS) || (__ENV_FROM_UPS __FROM_DHL) describe DHL_UPS_MISMATCHvirus DHL-USA or UPS score DHL_UPS_MISMATCH11 When I wrote this to the MPC rules file, none of my other rules work. Regards, Sergio On Mon, Nov 21, 2011 at 10:55 AM, Bowie Bailey bowie_bai...@buc.com mailto:bowie_bai...@buc.com wrote: On 11/21/2011 11:35 AM, John Hardin wrote: On Mon, 21 Nov 2011, Bowie Bailey wrote: On 11/20/2011 10:02 PM, Sergio wrote: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]+\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i These will match any domain that starts with dh and ends with .com. You overlooked the l. Hmm... Guess I did... For example, they will match someu...@dhalailama.com mailto:someu...@dhalailama.com. Is this expected? It won't. If you just want to match a single character, then get rid of the +. It's to match -usa or other dhl domain name variants. The line wrap in email makes that look like a single character RE. The actual RE I suggested is: /envelope-from [^ @]+@dhl[^ .]+\.com/i The line wrap wasn't an issue. I just didn't see the l. And with this font, I think I see why I didn't see it the first time. It blends in with the square bracket. It also won't match dhl.com http://dhl.com. My bad. As I said, it was off the top of my head. These might be better: /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i /\bdhl(?:[-_][^ .]+)?\.com/i Do the @ characters need to be escaped? In a normal Perl RE they would, but I'm not sure if SA is treating them any differently since it is reading them in from a config file. -- Bowie -- - Ricardo Ardila Vetrovec Gerente de Redes CeTIC -- UNIMET tlf: 2403743
Re: Fwd: Help with constructing a rule for MCP
On 11/21/2011 1:30 PM, Sergio wrote: Unfortunately, it seems that MCP doesn't like the rule: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i header __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ @]+@ups\.com/i header __FROM_UPSFrom =~ /\bups\.com/i metaDHL_UPS_MISMATCH(__ENV_FROM_DHL __FROM_UPS) || (__ENV_FROM_UPS __FROM_DHL) describe DHL_UPS_MISMATCHvirus DHL-USA or UPS score DHL_UPS_MISMATCH11 When I wrote this to the MPC rules file, none of my other rules work. I'm not sure if escaping the @ symbols is required or not, but try this: header __ENV_FROM_DHLReceived =~ /envelope-from [^ \@]+\@dhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ \@]+\@ups\.com/i -- Bowie
Re: Fwd: Help with constructing a rule for MCP
That was the error, the @ has to be escaped \@, now it is working. Thank you all for your help on this rule. Regards, Sergio On Mon, Nov 21, 2011 at 1:16 PM, Bowie Bailey bowie_bai...@buc.com wrote: On 11/21/2011 1:30 PM, Sergio wrote: Unfortunately, it seems that MCP doesn't like the rule: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i header __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ @]+@ups\.com/i header __FROM_UPSFrom =~ /\bups\.com/i metaDHL_UPS_MISMATCH(__ENV_FROM_DHL __FROM_UPS) || (__ENV_FROM_UPS __FROM_DHL) describe DHL_UPS_MISMATCHvirus DHL-USA or UPS score DHL_UPS_MISMATCH11 When I wrote this to the MPC rules file, none of my other rules work. I'm not sure if escaping the @ symbols is required or not, but try this: header __ENV_FROM_DHLReceived =~ /envelope-from [^ \@]+\@dhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ \@]+\@ups\.com/i -- Bowie
Re: In subject how to detect a word in an EVAL string?
That's an excellent question. My systems receive this as well -Original Message- From: Sergio sec...@gmail.com Date: Mon, 21 Nov 2011 14:46:35 To: users@spamassassin.apache.org Subject: In subject how to detect a word in an EVAL string? I block a lot of spam searching for strings on the subject, but sometimes the subject in the header comes in EVAL, like this: Subject: =?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?= So, rules like this doesn't work: header ADVERTISE_RULE8Subject =~ /Publici dad/i describe ADVERTISE_RULE8Encripted word scoreADVERTISE_RULE811 Here is a copy of the full header: Received: from 50.22.109.145-static.reverse.softlayer.com ([50.22.109.145] helo=fievel.principalesperu.biz) by x with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from btoevev...@claro.com.pe) id 1RSZBF-0001v0-FF for x; Mon, 21 Nov 2011 13:05:25 -0600 Received: from [190.81.230.105] (helo=microsof-c7b2c4) by fievel.principalesperu.biz with esmtpa (Exim 4.69) (envelope-from btoevev...@claro.com.pe) id 1RSZAv-0007RN-GC; Mon, 21 Nov 2011 13:05:14 -0600 Message-ID: C8321B3E3280475FA8D0E34373BDFFA9@microsof-c7b2c4 Reply-To: =?iso-8859-1?B?Q0FOQVNUQVMgTkFWSURF0UFTXw==?= canastasvirtual...@terra.com.pe From: =?iso-8859-1?B?Q0FOQVNUQVMgTkFWSURF0UFTXw==?= btoevev...@claro.com.pe To: asq...@claro.com.pe Subject: =?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?= Date: Mon, 21 Nov 2011 14:04:43 -0500 MIME-Version: 1.0 Content-Type: multipart/related; Type=multipart/alternative; boundary==_NextPart_000_0550_01CCA856.84E55E60 Is there a way to decode the subject and found the word that I need to score? Regards, Sergio Cabrera
Re: In subject how to detect a word in an EVAL string?
On Mon, 2011-11-21 at 14:46 -0600, Sergio wrote: I block a lot of spam searching for strings on the subject, but sometimes the subject in the header comes in EVAL, like this: Subject: =?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?= Not eval, but encoded -- in this case even necessary, rather than an attempt at obfuscation, because it contains non ASCII letters. Anyway, SA *does* decode the header value by default, unless you use the :raw qualifier. So, rules like this doesn't work: header ADVERTISE_RULE8Subject =~ /Publici dad/i It doesn't work, because one of these chars is not an 'i'. The Subject decodes to: .Venta de CANASTAS NAVIDE_AS - publ_ci dad This is actually directly extracted from SA debugging, and thus decoded by SA. Note the underscores, which I used in place of the two non-ASCII chars. Your rule does not match, because the first 'i' is not. Using the /./ any char instead of it works. scoreADVERTISE_RULE811 That's a rather high score. And your RE sure could use some /\b/ word boundaries at the beginning and end of the match. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
A few questions regarding Bayesin in 3.4.0
Hi, I recently upgraded to SA 3.4.0-rsvnunknown (using https://launchpad.net/~spamassassin/+archive/spamassassin-old on Ubuntu 10.04 LTS) from SA 3.3.2 on different machine running ArchLinux. I use MySQL to store user preferences as well as Bayesin data. No AWL, no autolearning of the Bayesin filter and both machines run sa-update as daily cronjobs. I migrated my MySQL database containing all settings along with my /etc/spamassassin directory with my static settings/rules to the new machine, ran sa-update, sa-compile and restarted spamd. I was curious to see if 3.4.0 scored a certain message differently than 3.3.2, so I ran cat spam | spamc -u jes...@ifconfig.se -R in order to see the result. To my surprice, the bayesin filter only scored 60-80% (BAYES_60) where it previously scored 90-95% (BAYES_95) .. Has there been any major changes to the bayesin engine in 3.4? (and/or the SQL storage backend for it) .. I copied my spam/ham corpus to the new machine and ran sa-learn on top of the current database in order to see if that helped. Shockingly, it now scored 1-5% (BAYES_05) and I decided to start over.. Ran a sa-learn --clear in order to wipe out the old database and re-ran the sa-learn.. Now it scored perfectly 99-100% (BAYES_99) I also noticed that my old database only had 11k tokens while the new one got about 60k (both the old and new server has hapaxes enabled and was trained using a corpus of about 600 spam and 200 ham) Any thoughts or ideas what might have caused this? Regards, Jesper Wallin
Re: A few questions regarding Bayesin in 3.4.0
On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote: I recently upgraded to SA 3.4.0-rsvnunknown (using https://launchpad.net/~spamassassin/+archive/spamassassin-old on Ubuntu 10.04 LTS) from SA 3.3.2 on different machine running ArchLinux. I use MySQL to store user preferences as well as Bayesin data. No AWL, no autolearning of the Bayesin filter and both machines run sa-update as daily cronjobs. I migrated my MySQL database containing all settings along with my Maybe bug 6624? A MySQL server bug, that results in terrible Bayes performance. The MySQL version of Ubuntu Lucid seems to match the affected versions. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6624 Fixed in trunk / 3.4. Since your issues was with 3.4 this is kind of backwards, though the database migration might have triggered this. I don't see any other relevant changes. And no, the Bayes sub-system in SA has not been changed since 3.3. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: A few questions regarding Bayesin in 3.4.0
On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote: I also noticed that my old database only had 11k tokens while the new one got about 60k (both the old and new server has hapaxes enabled and was trained using a corpus of about 600 spam and 200 ham) Is that old database the original one from the previous system, or old as in before learning from scratch, but *after* migrating the db? I'd guess the latter. 11k tokens is terribly low, and as you just noticed even less than learning a handful from scratch. Are you sure the database conversion went cleanly? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: In subject how to detect a word in an EVAL string?
Thank you Karsten for your input. I have modified the rule to the following and is working great: header ADVERTISE_RULE8Subject =~ /publ.?.c.?.dad/i describe ADVERTISE_RULE8Encripted word scoreADVERTISE_RULE811 If I see there are a lot of false positives I will modify it a bit, but for now it is what I was looking for. Regards, Sergio 2011/11/21 Karsten Bräckelmann guent...@rudersport.de On Mon, 2011-11-21 at 14:46 -0600, Sergio wrote: I block a lot of spam searching for strings on the subject, but sometimes the subject in the header comes in EVAL, like this: Subject: =?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?= Not eval, but encoded -- in this case even necessary, rather than an attempt at obfuscation, because it contains non ASCII letters. Anyway, SA *does* decode the header value by default, unless you use the :raw qualifier. So, rules like this doesn't work: header ADVERTISE_RULE8Subject =~ /Publici dad/i It doesn't work, because one of these chars is not an 'i'. The Subject decodes to: .Venta de CANASTAS NAVIDE_AS - publ_ci dad This is actually directly extracted from SA debugging, and thus decoded by SA. Note the underscores, which I used in place of the two non-ASCII chars. Your rule does not match, because the first 'i' is not. Using the /./ any char instead of it works. scoreADVERTISE_RULE811 That's a rather high score. And your RE sure could use some /\b/ word boundaries at the beginning and end of the match. -- char *t=\10pse\0r\0dtu\0.@ghno \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: A few questions regarding Bayesin in 3.4.0
Hi again and thanks for your quick reply.. On 11/22/2011 12:35 AM, Karsten Bräckelmann wrote: On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote: I also noticed that my old database only had 11k tokens while the new one got about 60k (both the old and new server has hapaxes enabled and was trained using a corpus of about 600 spam and 200 ham) Is that old database the original one from the previous system, or old as in before learning from scratch, but *after* migrating the db? I'd guess the latter. 11k tokens is terribly low, and as you just noticed even less than learning a handful from scratch. I meant the original database, created by SA 3.3.2.. It got about 11k tokens. Also, it runs MySQL 5.5.17 (as that machine runs ArchLinux) and I'm not sure about the last comment on the MySQL bug page, it doesn't really say if it's fixed or not in 5.5.16. Are you sure the database conversion went cleanly? I used mysqldump db.sql and mysql db.sql to migrate my entire MySQL database. Maybe sa-learn would've been a more correct way? Though, if the Bayes-backend hasn't been touched, it shouldn't really matter? Regards, Jesper Wallin
Re: Fwd: Help with constructing a rule for MCP
On Mon, 21 Nov 2011, Sergio wrote: Unfortunately, it seems that MCP doesn't like the rule: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i header __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i header __ENV_FROM_UPS Received =~ /envelope-from [^ @]+@ups\.com/i header __FROM_UPSFrom =~ /\bups\.com/i metaDHL_UPS_MISMATCH(__ENV_FROM_DHL __FROM_UPS) || (__ENV_FROM_UPS __FROM_DHL) describe DHL_UPS_MISMATCHvirus DHL-USA or UPS score DHL_UPS_MISMATCH11 When I wrote this to the MPC rules file, none of my other rules work. Bowie is right. I missed escaping the at signs. Put a backslash in front of each one that isn't in square brackets: /envelope-from [^ @]+\@ups\.com/i But that shouldn't break _other_ rules... On Mon, Nov 21, 2011 at 10:55 AM, Bowie Bailey bowie_bai...@buc.com wrote: On 11/21/2011 11:35 AM, John Hardin wrote: On Mon, 21 Nov 2011, Bowie Bailey wrote: On 11/20/2011 10:02 PM, Sergio wrote: header __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]+\.com/i header __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i These will match any domain that starts with dh and ends with .com. You overlooked the l. Hmm... Guess I did... For example, they will match someu...@dhalailama.com. Is this expected? It won't. If you just want to match a single character, then get rid of the +. It's to match -usa or other dhl domain name variants. The line wrap in email makes that look like a single character RE. The actual RE I suggested is: /envelope-from [^ @]+@dhl[^ .]+\.com/i The line wrap wasn't an issue. I just didn't see the l. And with this font, I think I see why I didn't see it the first time. It blends in with the square bracket. It also won't match dhl.com. My bad. As I said, it was off the top of my head. These might be better: /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i /\bdhl(?:[-_][^ .]+)?\.com/i Do the @ characters need to be escaped? In a normal Perl RE they would, but I'm not sure if SA is treating them any differently since it is reading them in from a config file. -- Bowie -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The difference is that Unix has had thirty years of technical types demanding basic functionality of it. And the Macintosh has had fifteen years of interface fascist users shaping its progress. Windows has the hairpin turns of the Microsoft marketing machine and that's all.-- Red Drag Diva --- 348 days since the first successful private orbital launch (SpaceX)
Re: A few questions regarding Bayesin in 3.4.0
On Tue, 2011-11-22 at 01:47 +0100, Jesper Wallin wrote: On 11/22/2011 12:35 AM, Karsten Bräckelmann wrote: I also noticed that my old database only had 11k tokens while the new one got about 60k (both the old and new server has hapaxes enabled and was trained using a corpus of about 600 spam and 200 ham) Is that old database the original one from the previous system, or old as in before learning from scratch, but *after* migrating the db? I'd guess the latter. 11k tokens is terribly low, and as you just noticed even less than learning a handful from scratch. I meant the original database, created by SA 3.3.2.. It got about 11k tokens. Also, it runs MySQL 5.5.17 (as that machine runs ArchLinux) and I'm not sure about the last comment on the MySQL bug page, it doesn't really say if it's fixed or not in 5.5.16. Your Ubuntu system uses 5.1, though. Anyway, I guess to ever find out if this might be the issue, Mark or someone else needs to come up with some funky idea. And regardless, 11k tokens is terribly low. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: In subject how to detect a word in an EVAL string?
On Mon, 2011-11-21 at 17:49 -0600, Sergio wrote: Thank you Karsten for your input. I have modified the rule to the following and is working great: header ADVERTISE_RULE8Subject =~ /publ.?.c.?.dad/i I see you wildcarded both instances of 'i', with an additional, optional second char each. However, you also dropped the space in publici dad as per your original rule -- intended? Doesn't have publicidad a more general meaning, too? If I see there are a lot of false positives I will modify it a bit, but for now it is what I was looking for. Again, I strongly recommend to lower the score. And, of course to add a \b word boundary at the beginning and end of the patter. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: In subject how to detect a word in an EVAL string?
Spammers are using a lot of different ways of using the word publicidad, I had a few different rules to block them, but since now I saw that there was a character ¡ used an i and at the same time an i followed by an space. So, I used the .?. and it catches the i and the space and just in case the spamer tries to use publi ci dad it will be catched as well. In my RegEx editor it passes the test. About the word publicidad In my server not much people uses that word and that is why I can block it. Sergio 2011/11/21 Karsten Bräckelmann guent...@rudersport.de On Mon, 2011-11-21 at 17:49 -0600, Sergio wrote: Thank you Karsten for your input. I have modified the rule to the following and is working great: header ADVERTISE_RULE8Subject =~ /publ.?.c.?.dad/i I see you wildcarded both instances of 'i', with an additional, optional second char each. However, you also dropped the space in publici dad as per your original rule -- intended? Doesn't have publicidad a more general meaning, too? If I see there are a lot of false positives I will modify it a bit, but for now it is what I was looking for. Again, I strongly recommend to lower the score. And, of course to add a \b word boundary at the beginning and end of the patter. -- char *t=\10pse\0r\0dtu\0.@ghno \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}