Negative score spamassassin

2011-11-21 Thread ercibrest

Hello and sorry for my english.

I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have
pyzor ou razor. Mailscanner is only a gateway for my exchange 2010

In Spamassassin, i have really very bad score or negative score, for example
the last emails and score from spamassassin :

-1,24
-1,72
-2.90
-2,47
-1,22

example of mail not considere as spam but it is spam ! :
cached not   
score=3.722   
5 requis   
0.80 BAYES_50 Bayes spam probability is 40 to 60% 
0.00 FREEMAIL_FROM Sender email is freemail 
1.93 FREEMAIL_REPLY From and body contain different freemails 
0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 
0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 
0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 
0.00 HTML_MESSAGE HTML included in message 
0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars 


Maybe there is a problem of configuration because all of my emails come from
the same IP. From internet, email send to my domain is receive from my
provider and then, the provider relay mails to my mailscanner 's server.

about this, maybe spamassassin can t do his job ? how to configure
spamassassin for this ?



-- 
View this message in context: 
http://old.nabble.com/Negative-score-spamassassin-tp32870220p32870220.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Negative score spamassassin

2011-11-21 Thread ercibrest

Hello and sorry for my english.

I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have
pyzor ou razor. Mailscanner is only a gateway for my exchange 2010

In Spamassassin, i have really very bad score or negative score, for example
the last emails and score from spamassassin :

-1,24
-1,72
-2.90
-2,47
-1,22

example of mail not considere as spam but it is spam ! :
cached not   
score=3.722   
5 requis   
0.80 BAYES_50 Bayes spam probability is 40 to 60% 
0.00 FREEMAIL_FROM Sender email is freemail 
1.93 FREEMAIL_REPLY From and body contain different freemails 
0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 
0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 
0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 
0.00 HTML_MESSAGE HTML included in message 
0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars 


Maybe there is a problem of configuration because all of my emails come from
the same IP. From internet, email send to my domain is receive from my
provider and then, the provider relay mails to my mailscanner 's server.

about this, maybe spamassassin can t do his job ? how to configure
spamassassin for this ?



-- 
View this message in context: 
http://old.nabble.com/Negative-score-spamassassin-tp32870222p32870222.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Negative score spamassassin

2011-11-21 Thread ercibrest

Hello and sorry for my english.

I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have
pyzor ou razor. Mailscanner is only a gateway for my exchange 2010

In Spamassassin, i have really very bad score or negative score, for example
the last emails and score from spamassassin :

-1,24
-1,72
-2.90
-2,47
-1,22

example of mail not considere as spam but it is spam ! :
cached not   
score=3.722   
5 requis   
0.80 BAYES_50 Bayes spam probability is 40 to 60% 
0.00 FREEMAIL_FROM Sender email is freemail 
1.93 FREEMAIL_REPLY From and body contain different freemails 
0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam 
0.00 HTML_FONT_SIZE_HUGE HTML font size is huge 
0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area 
0.00 HTML_MESSAGE HTML included in message 
0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars 


Maybe there is a problem of configuration because all of my emails come from
the same IP. From internet, email send to my domain is receive from my
provider and then, the provider relay mails to my mailscanner 's server.

about this, maybe spamassassin can t do his job ? how to configure
spamassassin for this ?



-- 
View this message in context: 
http://old.nabble.com/Negative-score-spamassassin-tp32870223p32870223.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Negative score spamassassin

2011-11-21 Thread Martin Hepworth
need to see the rule hits for the negative scores..

also I don't see any RBL, URIBL, pyzor or razor scores in there, have you
disabled network tests?  these are really valuable - just make sure you
only choose a couple of the RBL's (see
http://wiki.mailscanner.info/doku.php?id=maq:index#getting_the_best_out_of_spamassassinfor
some ideas - it's a little outdated but still usefull I think)


-- 
Martin Hepworth
Oxford, UK


On 21 November 2011 08:26, ercibrest eric.le-co...@sopab.fr wrote:


 Hello and sorry for my english.

 I have got mailscanner, postfix 2.8.2, spamassassin 3.3.1. I don t have
 pyzor ou razor. Mailscanner is only a gateway for my exchange 2010

 In Spamassassin, i have really very bad score or negative score, for
 example
 the last emails and score from spamassassin :

 -1,24
 -1,72
 -2.90
 -2,47
 -1,22

 example of mail not considere as spam but it is spam ! :
 cached not
 score=3.722
 5 requis
 0.80 BAYES_50 Bayes spam probability is 40 to 60%
 0.00 FREEMAIL_FROM Sender email is freemail
 1.93 FREEMAIL_REPLY From and body contain different freemails
 0.55 FUZZY_AMBIEN Attempt to obfuscate words in spam
 0.00 HTML_FONT_SIZE_HUGE HTML font size is huge
 0.44 HTML_IMAGE_RATIO_02 HTML has a low ratio of text to image area
 0.00 HTML_MESSAGE HTML included in message
 0.00 MIME_QP_LONG_LINE Quoted-printable line longer than 76 chars


 Maybe there is a problem of configuration because all of my emails come
 from
 the same IP. From internet, email send to my domain is receive from my
 provider and then, the provider relay mails to my mailscanner 's server.

 about this, maybe spamassassin can t do his job ? how to configure
 spamassassin for this ?



 --
 View this message in context:
 http://old.nabble.com/Negative-score-spamassassin-tp32870223p32870223.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.




Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES‏

2011-11-21 Thread pipjg

Hi,

Was wondering if could have some advice, and I probably know what I'm going
to do anyway, just wanted a few others opinions..

I've been analysing a load of mail which is having it's SA score reduced by
what looks like paid for whitelists. A view of the SA scores I'm seeing is:

RuleTotal   Ham %   Spam%
RP_MATCHES_RCVD 161,165 142,559 88.518,606  11.5
RCVD_IN_RP_SAFE 22,405  22,399  100 6   0
RCVD_IN_RP_CERTIFIED22,130  22,125  100 5   0
RCVD_IN_RP_RNBL 12,794  43  0.3 12,751  99.7
T_RP_MATCHES_RCVD   7,080   5,072   71.62,008   28.4

Now looking at virtualls ALL of these they look like SPAM.

Now the scores for this GASH are as follows:


RP_MATCHES_RCVD  -2.023 -1.201 -2.023 -1.201
RCVD_IN_RP_SAFE 0.0 -2.0 0.0 -2.0
RCVD_IN_RP_CERTIFIED 0.0 -3.0 0.0 -3.0
RCVD_IN_RP_RNBL 0 1.284 0 1.31

For some reason I can't find any scores for T_RP_MATCHES_RCVD. Am I being
dumn here? Does the T_ mean something I don't know?

So anyway, what I recon I should do is get rid of all the negative scores
for these Rules, as looking at the scores above, they are all suspicious,
and looking at the actual mails, they are pretty dodgy.

Has anyone else seen this or got any advice on this matter? Should we be
trusting a paid for whitelist?

I also saw something about fake RP headers? Could this be the case?

Thanks

Pip

(Apologies have posted same to mailing list but thought I'd try a 2 pronged
approach!)
-- 
View this message in context: 
http://old.nabble.com/Return-Path-Whitelists%2C-RP_SAFE%2C-RP_CERTIFIED%2C-RP_MATCHES%E2%80%8F-tp32870476p32870476.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES‏

2011-11-21 Thread Benny Pedersen

On Mon, 21 Nov 2011 03:11:48 -0800 (PST), pipjg wrote:

Has anyone else seen this or got any advice on this matter? Should we 
be

trusting a paid for whitelist?


where do you pay ?
why not report spam to returnpath ?

but feel free to set scores to zero, if you like to pay :-)




Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES‏

2011-11-21 Thread RW
On Mon, 21 Nov 2011 03:11:48 -0800 (PST)
pipjg wrote:

 
 Hi,
 
 Was wondering if could have some advice, and I probably know what I'm
 going to do anyway, just wanted a few others opinions..
 
 I've been analysing a load of mail which is having it's SA score
 reduced by what looks like paid for whitelists. A view of the SA
 scores I'm seeing is:
 
 Rule  Total   Ham %   Spam%
 RP_MATCHES_RCVD   161,165 142,559 88.5
 18,60611.5 RCVD_IN_RP_SAFE22,405  22,399
   100 6   0 RCVD_IN_RP_CERTIFIED  22,130
   22,125  100 5   0 RCVD_IN_RP_RNBL
   12,794  43  0.3 12,751  99.7
 T_RP_MATCHES_RCVD 7,080   5,072   71.6
 2,008 28.4
 
 Now looking at virtualls ALL of these they look like SPAM.

No they don't, you haven't read your own results correctly.
RCVD_IN_RP_SAFE and RCVD_IN_RP_CERTIFIED are ~100% Ham. RCVD_IN_RP_RNBL
is a blacklist rule, so it's supposed to hit spam.

[T_]RP_MATCHES_RCVD are not ReturnPath whitelist rules:

describe RP_MATCHES_RCVD  Envelope sender domain matches handover relay
domain

Everything related to ReturnPath.net/senderscore is working
remarkably well for you.


 For some reason I can't find any scores for T_RP_MATCHES_RCVD. Am I
 being dumn here? Does the T_ mean something I don't know?

T_* rules are under test, so it's an earlier name for RP_MATCHES_RCVD.


Re: Help with constructing a rule for MCP

2011-11-21 Thread Bowie Bailey
On 11/20/2011 10:02 PM, Sergio wrote:

 header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^
 .]+\.com/i
 header   __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i

These will match any domain that starts with dh and ends with .com. 
For example, they will match someu...@dhalailama.com.  Is this
expected?  If you just want to match a single character, then get rid of
the +.

header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^ .]\.com/i
header   __FROM_DHLFrom =~ /\bdhl[^ .]\.com/i

-- 
Bowie


Re: Negative score spamassassin

2011-11-21 Thread darxus
On 11/21, ercibrest wrote:
 Maybe there is a problem of configuration because all of my emails come from
 the same IP. From internet, email send to my domain is receive from my
 provider and then, the provider relay mails to my mailscanner 's server.

Add that IP to your trusted_networks setting, documented in the
spamassassin man page:
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#network_test_options
Also some info here:  http://wiki.apache.org/spamassassin/TrustPath

-- 
It's never too late to panic.
http://www.ChaosReigns.com


Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES‏

2011-11-21 Thread darxus
On 11/21, pipjg wrote:
 dumn here? Does the T_ mean something I don't know?

Yes, it means there is a bug in the way spamassassin rules are being
published.  It stands for testing.

rules with a T_ prefix to their names are never published
- http://wiki.apache.org/spamassassin/SaUpdateBackend
This is the first google hit for: spamassassin t_

Although I don't currently see T_RP_MATCHES_RCVD in my rules.  Run
sa-update again (you run it daily from cron, right?), check to see if it's
still there, and if it is, open a bug:
https://issues.apache.org/SpamAssassin/

Rules that don't have a score defined have a default score of 1, or, in
this case, -1, because it has the nice flag set (it's intended to hit
ham, not spam).

-- 
A ship in a port is safe, but that's not what ships are built for.
-Grace Murray Hopper
http://www.ChaosReigns.com


Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES‏

2011-11-21 Thread Bowie Bailey
On 11/21/2011 10:53 AM, dar...@chaosreigns.com wrote:
 On 11/21, pipjg wrote:
 dumn here? Does the T_ mean something I don't know?
 Yes, it means there is a bug in the way spamassassin rules are being
 published.  It stands for testing.

 rules with a T_ prefix to their names are never published
 - http://wiki.apache.org/spamassassin/SaUpdateBackend
 This is the first google hit for: spamassassin t_

 Although I don't currently see T_RP_MATCHES_RCVD in my rules.  Run
 sa-update again (you run it daily from cron, right?), check to see if it's
 still there, and if it is, open a bug:
 https://issues.apache.org/SpamAssassin/

 Rules that don't have a score defined have a default score of 1, or, in
 this case, -1, because it has the nice flag set (it's intended to hit
 ham, not spam).

Except for T_ rules -- they have a default score of 0.01.

-- 
Bowie


Re: Return Path Whitelists, RP_SAFE, RP_CERTIFIED, RP_MATCHES‏

2011-11-21 Thread RW
On Mon, 21 Nov 2011 13:50:05 +
RW wrote:

 On Mon, 21 Nov 2011 03:11:48 -0800 (PST)
 pipjg wrote:

  RuleTotal   Ham %   Spam%
  RP_MATCHES_RCVD 161,165 142,559 88.5
  18,606  11.5 RCVD_IN_RP_SAFE22,405  22,399

 
 describe RP_MATCHES_RCVD  Envelope sender domain matches handover
 relay domain

Actually, now I come to think about it I had a problem
RP_MATCHES_RCVD, and I wasn't the only one: 

http://old.nabble.com/RP_MATCHES_RCVD-to32157087.html


Re: Help with constructing a rule for MCP

2011-11-21 Thread John Hardin

On Mon, 21 Nov 2011, Bowie Bailey wrote:


On 11/20/2011 10:02 PM, Sergio wrote:


header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^
.]+\.com/i
header   __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i


These will match any domain that starts with dh and ends with .com.


You overlooked the l.


For example, they will match someu...@dhalailama.com.  Is this
expected?


It won't.


If you just want to match a single character, then get rid of
the +.


It's to match -usa or other dhl domain name variants. The line wrap in 
email makes that look like a single character RE. The actual RE I 
suggested is:


  /envelope-from [^ @]+@dhl[^ .]+\.com/i

It also won't match dhl.com. My bad. As I said, it was off the top of my 
head.


These might be better:

  /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i

  /\bdhl(?:[-_][^ .]+)?\.com/i

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Mine eyes have seen the horror of the voting of the horde;
  They've looted the fromagerie where guv'ment cheese is stored;
  If war's not won before the break they grow so quickly bored;
  Their vote counts as much as yours.  -- Tam
---
 348 days since the first successful private orbital launch (SpaceX)


Re: Help with constructing a rule for MCP

2011-11-21 Thread Bowie Bailey
On 11/21/2011 11:35 AM, John Hardin wrote:
 On Mon, 21 Nov 2011, Bowie Bailey wrote:

 On 11/20/2011 10:02 PM, Sergio wrote:
 header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^
 .]+\.com/i
 header   __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i
 These will match any domain that starts with dh and ends with .com.
 You overlooked the l.

Hmm...  Guess I did...


 For example, they will match someu...@dhalailama.com.  Is this
 expected?
 It won't.

 If you just want to match a single character, then get rid of
 the +.
 It's to match -usa or other dhl domain name variants. The line wrap in 
 email makes that look like a single character RE. The actual RE I 
 suggested is:

/envelope-from [^ @]+@dhl[^ .]+\.com/i

The line wrap wasn't an issue.  I just didn't see the l.  And with
this font, I think I see why I didn't see it the first time.  It blends
in with the square bracket.

 It also won't match dhl.com. My bad. As I said, it was off the top of my 
 head.

 These might be better:

/envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i

/\bdhl(?:[-_][^ .]+)?\.com/i

Do the @ characters need to be escaped?  In a normal Perl RE they
would, but I'm not sure if SA is treating them any differently since it
is reading them in from a config file.

-- 
Bowie


Re: Detecting serious domains

2011-11-21 Thread Michelle Konzack
Hello Marc,

Am 2011-11-17 07:27:51, hacktest Du folgendes herunter:
 determine if it's spam or ham in itself. Yahoo is a serious domain
 and there's lost of spam. Serious domains should not be blacklisted

Ehm?

I block @yahoo.com on SMTP level (on my corporated Server), because if
I remove the BLOCK, I would get every day around  20-80.000  Spams  from
Yahoo.

 for example. We could also look for consistency. Bad RDNS from a
 serious domain might be a spam indicator.

Right

 Also - thinking we should slowly mine the whois database and provide
 some sort of DNS based lookup of whois information to be able to
 determine the registrar of a domain, the domain age, or other info
 that would be useful in determining that the domain is serious or
 not.

1+  At least for the Domain Age!

 Who thinks I'm onto something?

Thanks, Greetings and nice Day/Evening
Michelle Konzack

-- 
# Debian GNU/Linux Consultant ##
   Development of Intranet and Embedded Systems with Debian GNU/Linux
   Internet Service Provider, Cloud Computing
http://www.itsystems.tamay-dogan.net/

itsystems@tdnet Jabber  linux4miche...@jabber.ccc.de
Owner Michelle Konzack

Gewerbe Strasse 3   Tel office: +49-176-86004575
77694 Kehl  Tel mobil:  +49-177-9351947
Germany Tel mobil:  +33-6-61925193  (France)

USt-ID:  DE 278 049 239

Linux-User #280138 with the Linux Counter, http://counter.li.org/


signature.pgp
Description: Digital signature


Re: Detecting serious domains

2011-11-21 Thread Michelle Konzack
Hello Kevin A. McGrail,

Am 2011-11-17 10:56:52, hacktest Du folgendes herunter:
 For example, I've seen .info domains used a lot by spammers.  I'm
 sure there is a patter there with a registrar probably.

Here I can say, the DOT INFO spam is nearly 60%.

Thanks, Greetings and nice Day/Evening
Michelle Konzack

-- 
# Debian GNU/Linux Consultant ##
   Development of Intranet and Embedded Systems with Debian GNU/Linux
   Internet Service Provider, Cloud Computing
http://www.itsystems.tamay-dogan.net/

itsystems@tdnet Jabber  linux4miche...@jabber.ccc.de
Owner Michelle Konzack

Gewerbe Strasse 3   Tel office: +49-176-86004575
77694 Kehl  Tel mobil:  +49-177-9351947
Germany Tel mobil:  +33-6-61925193  (France)

USt-ID:  DE 278 049 239

Linux-User #280138 with the Linux Counter, http://counter.li.org/


signature.pgp
Description: Digital signature


Fwd: Help with constructing a rule for MCP

2011-11-21 Thread Sergio
Unfortunately, it seems that MCP doesn't like the rule:

header  __ENV_FROM_DHLReceived =~ /envelope-from [^
@]+@dhl(?:[-_][^ .]+)?\.com/i
header  __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i
header  __ENV_FROM_UPS   Received =~ /envelope-from [^
@]+@ups\.com/i

header  __FROM_UPSFrom =~ /\bups\.com/i
metaDHL_UPS_MISMATCH(__ENV_FROM_DHL  __FROM_UPS) ||
(__ENV_FROM_UPS  __FROM_DHL)
describe   DHL_UPS_MISMATCHvirus DHL-USA or UPS
score   DHL_UPS_MISMATCH11

When I wrote this to the MPC rules file, none of my other rules work.

Regards,

Sergio




On Mon, Nov 21, 2011 at 10:55 AM, Bowie Bailey bowie_bai...@buc.com wrote:

 On 11/21/2011 11:35 AM, John Hardin wrote:
  On Mon, 21 Nov 2011, Bowie Bailey wrote:
 
  On 11/20/2011 10:02 PM, Sergio wrote:
  header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^
  .]+\.com/i
  header   __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i
  These will match any domain that starts with dh and ends with .com.
  You overlooked the l.

 Hmm...  Guess I did...

 
  For example, they will match someu...@dhalailama.com.  Is this
  expected?
  It won't.
 
  If you just want to match a single character, then get rid of
  the +.
  It's to match -usa or other dhl domain name variants. The line wrap in
  email makes that look like a single character RE. The actual RE I
  suggested is:
 
 /envelope-from [^ @]+@dhl[^ .]+\.com/i

 The line wrap wasn't an issue.  I just didn't see the l.  And with
 this font, I think I see why I didn't see it the first time.  It blends
 in with the square bracket.

  It also won't match dhl.com. My bad. As I said, it was off the top of
 my
  head.
 
  These might be better:
 
 /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i
 
 /\bdhl(?:[-_][^ .]+)?\.com/i

 Do the @ characters need to be escaped?  In a normal Perl RE they
 would, but I'm not sure if SA is treating them any differently since it
 is reading them in from a config file.

 --
 Bowie



Re: Detecting serious domains

2011-11-21 Thread Michelle Konzack
Hello dar...@chaosreigns.com,

Am 2011-11-17 12:29:41, hacktest Du folgendes herunter:
 There could be a useful correlation there, but I need to point out that if
 a domain has no MX records, the correct thing to do is to send email to the
 A record for the domain, and I've seen legit domains configured that way
 and unwilling to change.  It's not even a violation of RFC.

Right, but MOST spamers act like this AND there IP does not  respond  to
SMTP requests.  So, why waysting time and resource?

Thanks, Greetings and nice Day/Evening
Michelle Konzack

-- 
# Debian GNU/Linux Consultant ##
   Development of Intranet and Embedded Systems with Debian GNU/Linux
   Internet Service Provider, Cloud Computing
http://www.itsystems.tamay-dogan.net/

itsystems@tdnet Jabber  linux4miche...@jabber.ccc.de
Owner Michelle Konzack

Gewerbe Strasse 3   Tel office: +49-176-86004575
77694 Kehl  Tel mobil:  +49-177-9351947
Germany Tel mobil:  +33-6-61925193  (France)

USt-ID:  DE 278 049 239

Linux-User #280138 with the Linux Counter, http://counter.li.org/


signature.pgp
Description: Digital signature


Re: Fwd: Help with constructing a rule for MCP

2011-11-21 Thread Ricardo Ardila Vetrovec

Did you try to monitor the log looking if the rule was detected?



El 21/11/2011 02:00 p.m., Sergio escribió:

Unfortunately, it seems that MCP doesn't like the rule:

header  __ENV_FROM_DHLReceived =~ /envelope-from [^ 
@]+@dhl(?:[-_][^ .]+)?\.com/i

header  __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i
header  __ENV_FROM_UPS   Received =~ /envelope-from [^ 
@]+@ups\.com/i


header  __FROM_UPSFrom =~ /\bups\.com/i
metaDHL_UPS_MISMATCH(__ENV_FROM_DHL  __FROM_UPS) || 
(__ENV_FROM_UPS  __FROM_DHL)

describe   DHL_UPS_MISMATCHvirus DHL-USA or UPS
score   DHL_UPS_MISMATCH11

When I wrote this to the MPC rules file, none of my other rules work.

Regards,

Sergio




On Mon, Nov 21, 2011 at 10:55 AM, Bowie Bailey bowie_bai...@buc.com 
mailto:bowie_bai...@buc.com wrote:


On 11/21/2011 11:35 AM, John Hardin wrote:
 On Mon, 21 Nov 2011, Bowie Bailey wrote:

 On 11/20/2011 10:02 PM, Sergio wrote:
 header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^
 .]+\.com/i
 header   __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i
 These will match any domain that starts with dh and ends with
.com.
 You overlooked the l.

Hmm...  Guess I did...


 For example, they will match someu...@dhalailama.com
mailto:someu...@dhalailama.com.  Is this
 expected?
 It won't.

 If you just want to match a single character, then get rid of
 the +.
 It's to match -usa or other dhl domain name variants. The line
wrap in
 email makes that look like a single character RE. The actual RE I
 suggested is:

/envelope-from [^ @]+@dhl[^ .]+\.com/i

The line wrap wasn't an issue.  I just didn't see the l.  And with
this font, I think I see why I didn't see it the first time.  It
blends
in with the square bracket.

 It also won't match dhl.com http://dhl.com. My bad. As I
said, it was off the top of my
 head.

 These might be better:

/envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i

/\bdhl(?:[-_][^ .]+)?\.com/i

Do the @ characters need to be escaped?  In a normal Perl RE they
would, but I'm not sure if SA is treating them any differently
since it
is reading them in from a config file.

--
Bowie





--
-
Ricardo Ardila Vetrovec
Gerente de Redes
CeTIC -- UNIMET
tlf: 2403743



Re: Fwd: Help with constructing a rule for MCP

2011-11-21 Thread Bowie Bailey
On 11/21/2011 1:30 PM, Sergio wrote:
 Unfortunately, it seems that MCP doesn't like the rule:

 header  __ENV_FROM_DHLReceived =~ /envelope-from [^
 @]+@dhl(?:[-_][^ .]+)?\.com/i
 header  __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i
 header  __ENV_FROM_UPS   Received =~ /envelope-from [^
 @]+@ups\.com/i

 header  __FROM_UPSFrom =~ /\bups\.com/i
 metaDHL_UPS_MISMATCH(__ENV_FROM_DHL  __FROM_UPS) ||
 (__ENV_FROM_UPS  __FROM_DHL)
 describe   DHL_UPS_MISMATCHvirus DHL-USA or UPS
 score   DHL_UPS_MISMATCH11

 When I wrote this to the MPC rules file, none of my other rules work.

I'm not sure if escaping the @ symbols is required or not, but try this:

header  __ENV_FROM_DHLReceived =~ /envelope-from [^
\@]+\@dhl(?:[-_][^ .]+)?\.com/i
header  __ENV_FROM_UPS   Received =~ /envelope-from [^
\@]+\@ups\.com/i

-- 
Bowie


Re: Fwd: Help with constructing a rule for MCP

2011-11-21 Thread Sergio
That was the error, the @ has to be escaped \@, now it is working.

Thank you all for your help on this rule.

Regards,

Sergio



On Mon, Nov 21, 2011 at 1:16 PM, Bowie Bailey bowie_bai...@buc.com wrote:

 On 11/21/2011 1:30 PM, Sergio wrote:
  Unfortunately, it seems that MCP doesn't like the rule:
 
  header  __ENV_FROM_DHLReceived =~ /envelope-from [^
  @]+@dhl(?:[-_][^ .]+)?\.com/i
  header  __FROM_DHLFrom =~ /\bdhl(?:[-_][^
 .]+)?\.com/i
  header  __ENV_FROM_UPS   Received =~ /envelope-from [^
  @]+@ups\.com/i
 
  header  __FROM_UPSFrom =~ /\bups\.com/i
  metaDHL_UPS_MISMATCH(__ENV_FROM_DHL  __FROM_UPS) ||
  (__ENV_FROM_UPS  __FROM_DHL)
  describe   DHL_UPS_MISMATCHvirus DHL-USA or UPS
  score   DHL_UPS_MISMATCH11
 
  When I wrote this to the MPC rules file, none of my other rules work.

 I'm not sure if escaping the @ symbols is required or not, but try this:

 header  __ENV_FROM_DHLReceived =~ /envelope-from [^
 \@]+\@dhl(?:[-_][^ .]+)?\.com/i
 header  __ENV_FROM_UPS   Received =~ /envelope-from [^
 \@]+\@ups\.com/i

 --
 Bowie



Re: In subject how to detect a word in an EVAL string?

2011-11-21 Thread rvetrovec
That's an excellent question. My systems receive this as well 




-Original Message-
From: Sergio sec...@gmail.com
Date: Mon, 21 Nov 2011 14:46:35 
To: users@spamassassin.apache.org
Subject: In subject how to detect a word in an EVAL string?

I block a lot of spam searching for strings on the subject, but sometimes
the subject in the header comes in EVAL, like this:
Subject:
=?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?=

So, rules like this doesn't work:
header   ADVERTISE_RULE8Subject =~ /Publici dad/i
describe ADVERTISE_RULE8Encripted word
scoreADVERTISE_RULE811

Here is a copy of the full header:


Received: from 50.22.109.145-static.reverse.softlayer.com ([50.22.109.145]
helo=fievel.principalesperu.biz)
 by x with esmtps (TLSv1:AES256-SHA:256)
 (Exim 4.69)
 (envelope-from btoevev...@claro.com.pe)
 id 1RSZBF-0001v0-FF
 for x; Mon, 21 Nov 2011 13:05:25 -0600
Received: from [190.81.230.105] (helo=microsof-c7b2c4)
 by fievel.principalesperu.biz with esmtpa (Exim 4.69)
 (envelope-from btoevev...@claro.com.pe)
 id 1RSZAv-0007RN-GC; Mon, 21 Nov 2011 13:05:14 -0600
Message-ID: C8321B3E3280475FA8D0E34373BDFFA9@microsof-c7b2c4
Reply-To: =?iso-8859-1?B?Q0FOQVNUQVMgTkFWSURF0UFTXw==?= 
canastasvirtual...@terra.com.pe
From: =?iso-8859-1?B?Q0FOQVNUQVMgTkFWSURF0UFTXw==?= btoevev...@claro.com.pe

To: asq...@claro.com.pe
Subject:
=?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?=
Date: Mon, 21 Nov 2011 14:04:43 -0500
MIME-Version: 1.0
Content-Type: multipart/related;
 Type=multipart/alternative;
 boundary==_NextPart_000_0550_01CCA856.84E55E60



Is there a way to decode the subject and found the word that I need to
score?

Regards,

Sergio Cabrera



Re: In subject how to detect a word in an EVAL string?

2011-11-21 Thread Karsten Bräckelmann
On Mon, 2011-11-21 at 14:46 -0600, Sergio wrote:
 I block a lot of spam searching for strings on the subject, but
 sometimes the subject in the header comes in EVAL, like this:
 Subject:
 =?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?=

Not eval, but encoded -- in this case even necessary, rather than an
attempt at obfuscation, because it contains non ASCII letters.

Anyway, SA *does* decode the header value by default, unless you use
the :raw qualifier.


 So, rules like this doesn't work:
 header   ADVERTISE_RULE8Subject =~ /Publici dad/i

It doesn't work, because one of these chars is not an 'i'. The Subject
decodes to:
  .Venta de CANASTAS NAVIDE_AS - publ_ci dad

This is actually directly extracted from SA debugging, and thus decoded
by SA. Note the underscores, which I used in place of the two non-ASCII
chars.

Your rule does not match, because the first 'i' is not. Using the /./
any char instead of it works.


 scoreADVERTISE_RULE811

That's a rather high score. And your RE sure could use some /\b/ word
boundaries at the beginning and end of the match.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



A few questions regarding Bayesin in 3.4.0

2011-11-21 Thread Jesper Wallin

Hi,

I recently upgraded to SA 3.4.0-rsvnunknown (using 
https://launchpad.net/~spamassassin/+archive/spamassassin-old on Ubuntu 
10.04 LTS) from SA 3.3.2 on different machine running ArchLinux. I use 
MySQL to store user preferences as well as Bayesin data. No AWL, no 
autolearning of the Bayesin filter and both machines run sa-update as 
daily cronjobs.


I migrated my MySQL database containing all settings along with my 
/etc/spamassassin directory with my static settings/rules to the new 
machine, ran sa-update, sa-compile and restarted spamd. I was curious to 
see if 3.4.0 scored a certain message differently than 3.3.2, so I ran 
cat spam | spamc -u jes...@ifconfig.se -R in order to see the result.


To my surprice, the bayesin filter only scored 60-80% (BAYES_60) where 
it previously scored 90-95% (BAYES_95) .. Has there been any major 
changes to the bayesin engine in 3.4? (and/or the SQL storage backend 
for it) .. I copied my spam/ham corpus to the new machine and ran 
sa-learn on top of the current database in order to see if that helped. 
Shockingly, it now scored 1-5% (BAYES_05) and I decided to start over.. 
Ran a sa-learn --clear in order to wipe out the old database and 
re-ran the sa-learn.. Now it scored perfectly 99-100% (BAYES_99)


I also noticed that my old database only had 11k tokens while the new 
one got about 60k (both the old and new server has hapaxes enabled and 
was trained using a corpus of about 600 spam and 200 ham)


Any thoughts or ideas what might have caused this?


Regards,
Jesper Wallin


Re: A few questions regarding Bayesin in 3.4.0

2011-11-21 Thread Karsten Bräckelmann
On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote:
 I recently upgraded to SA 3.4.0-rsvnunknown (using 
 https://launchpad.net/~spamassassin/+archive/spamassassin-old on Ubuntu 
 10.04 LTS) from SA 3.3.2 on different machine running ArchLinux. I use 
 MySQL to store user preferences as well as Bayesin data. No AWL, no 
 autolearning of the Bayesin filter and both machines run sa-update as 
 daily cronjobs.
 
 I migrated my MySQL database containing all settings along with my

Maybe bug 6624? A MySQL server bug, that results in terrible Bayes
performance. The MySQL version of Ubuntu Lucid seems to match the
affected versions.
  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6624

Fixed in trunk / 3.4. Since your issues was with 3.4 this is kind of
backwards, though the database migration might have triggered this.

I don't see any other relevant changes.

And no, the Bayes sub-system in SA has not been changed since 3.3.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A few questions regarding Bayesin in 3.4.0

2011-11-21 Thread Karsten Bräckelmann
On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote:
 I also noticed that my old database only had 11k tokens while the new 
 one got about 60k (both the old and new server has hapaxes enabled and 
 was trained using a corpus of about 600 spam and 200 ham)

Is that old database the original one from the previous system, or old
as in before learning from scratch, but *after* migrating the db?

I'd guess the latter. 11k tokens is terribly low, and as you just
noticed even less than learning a handful from scratch.

Are you sure the database conversion went cleanly?


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: In subject how to detect a word in an EVAL string?

2011-11-21 Thread Sergio
Thank you Karsten for your input.

I have modified the rule to the following and is working great:

header   ADVERTISE_RULE8Subject =~ /publ.?.c.?.dad/i
describe ADVERTISE_RULE8Encripted word
scoreADVERTISE_RULE811

If I see there are a lot of false positives I will modify it a bit, but for
now it is what I was looking for.

Regards,

Sergio

2011/11/21 Karsten Bräckelmann guent...@rudersport.de

 On Mon, 2011-11-21 at 14:46 -0600, Sergio wrote:
  I block a lot of spam searching for strings on the subject, but
  sometimes the subject in the header comes in EVAL, like this:
  Subject:
  =?iso-8859-1?B?LlZlbnRhIGRlIENBTkFTVEFTIE5BVklERdFBUyAtIHB1YmyhY2kgZGFk?=

 Not eval, but encoded -- in this case even necessary, rather than an
 attempt at obfuscation, because it contains non ASCII letters.

 Anyway, SA *does* decode the header value by default, unless you use
 the :raw qualifier.


  So, rules like this doesn't work:
  header   ADVERTISE_RULE8Subject =~ /Publici dad/i

 It doesn't work, because one of these chars is not an 'i'. The Subject
 decodes to:
  .Venta de CANASTAS NAVIDE_AS - publ_ci dad

 This is actually directly extracted from SA debugging, and thus decoded
 by SA. Note the underscores, which I used in place of the two non-ASCII
 chars.

 Your rule does not match, because the first 'i' is not. Using the /./
 any char instead of it works.


  scoreADVERTISE_RULE811

 That's a rather high score. And your RE sure could use some /\b/ word
 boundaries at the beginning and end of the match.


 --
 char *t=\10pse\0r\0dtu\0.@ghno
 \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
 main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8?
 c=1:
 (c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
 }}}




Re: A few questions regarding Bayesin in 3.4.0

2011-11-21 Thread Jesper Wallin

Hi again and thanks for your quick reply..

On 11/22/2011 12:35 AM, Karsten Bräckelmann wrote:

On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote:

I also noticed that my old database only had 11k tokens while the new
one got about 60k (both the old and new server has hapaxes enabled and
was trained using a corpus of about 600 spam and 200 ham)

Is that old database the original one from the previous system, or old
as in before learning from scratch, but *after* migrating the db?

I'd guess the latter. 11k tokens is terribly low, and as you just
noticed even less than learning a handful from scratch.
I meant the original database, created by SA 3.3.2.. It got about 11k 
tokens. Also, it runs MySQL 5.5.17 (as that machine runs ArchLinux) and 
I'm not sure about the last comment on the MySQL bug page, it doesn't 
really say if it's fixed or not in 5.5.16.

Are you sure the database conversion went cleanly?

I used mysqldump  db.sql and mysql  db.sql to migrate my entire 
MySQL database. Maybe sa-learn would've been a more correct way? Though, 
if the Bayes-backend hasn't been touched, it shouldn't really matter?



Regards,
Jesper Wallin


Re: Fwd: Help with constructing a rule for MCP

2011-11-21 Thread John Hardin

On Mon, 21 Nov 2011, Sergio wrote:


Unfortunately, it seems that MCP doesn't like the rule:

header  __ENV_FROM_DHLReceived =~ /envelope-from [^
@]+@dhl(?:[-_][^ .]+)?\.com/i
header  __FROM_DHLFrom =~ /\bdhl(?:[-_][^ .]+)?\.com/i
header  __ENV_FROM_UPS   Received =~ /envelope-from [^
@]+@ups\.com/i

header  __FROM_UPSFrom =~ /\bups\.com/i
metaDHL_UPS_MISMATCH(__ENV_FROM_DHL  __FROM_UPS) ||
(__ENV_FROM_UPS  __FROM_DHL)
describe   DHL_UPS_MISMATCHvirus DHL-USA or UPS
score   DHL_UPS_MISMATCH11

When I wrote this to the MPC rules file, none of my other rules work.


Bowie is right. I missed escaping the at signs. Put a backslash in front 
of each one that isn't in square brackets:


/envelope-from [^ @]+\@ups\.com/i

But that shouldn't break _other_ rules...



On Mon, Nov 21, 2011 at 10:55 AM, Bowie Bailey bowie_bai...@buc.com wrote:


On 11/21/2011 11:35 AM, John Hardin wrote:

On Mon, 21 Nov 2011, Bowie Bailey wrote:


On 11/20/2011 10:02 PM, Sergio wrote:

header   __ENV_FROM_DHLReceived =~ /envelope-from [^ @]+@dhl[^
.]+\.com/i
header   __FROM_DHLFrom =~ /\bdhl[^ .]+\.com/i

These will match any domain that starts with dh and ends with .com.

You overlooked the l.


Hmm...  Guess I did...




For example, they will match someu...@dhalailama.com.  Is this
expected?

It won't.


If you just want to match a single character, then get rid of
the +.

It's to match -usa or other dhl domain name variants. The line wrap in
email makes that look like a single character RE. The actual RE I
suggested is:

   /envelope-from [^ @]+@dhl[^ .]+\.com/i


The line wrap wasn't an issue.  I just didn't see the l.  And with
this font, I think I see why I didn't see it the first time.  It blends
in with the square bracket.


It also won't match dhl.com. My bad. As I said, it was off the top of

my

head.

These might be better:

   /envelope-from [^ @]+@dhl(?:[-_][^ .]+)?\.com/i

   /\bdhl(?:[-_][^ .]+)?\.com/i


Do the @ characters need to be escaped?  In a normal Perl RE they
would, but I'm not sure if SA is treating them any differently since it
is reading them in from a config file.

--
Bowie





--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The difference is that Unix has had thirty years of technical
  types demanding basic functionality of it. And the Macintosh has
  had fifteen years of interface fascist users shaping its progress.
  Windows has the hairpin turns of the Microsoft marketing machine
  and that's all.-- Red Drag Diva
---
 348 days since the first successful private orbital launch (SpaceX)


Re: A few questions regarding Bayesin in 3.4.0

2011-11-21 Thread Karsten Bräckelmann
On Tue, 2011-11-22 at 01:47 +0100, Jesper Wallin wrote:
 On 11/22/2011 12:35 AM, Karsten Bräckelmann wrote:

   I also noticed that my old database only had 11k tokens while the new
   one got about 60k (both the old and new server has hapaxes enabled and
   was trained using a corpus of about 600 spam and 200 ham)
  
  Is that old database the original one from the previous system, or old
  as in before learning from scratch, but *after* migrating the db?
 
  I'd guess the latter. 11k tokens is terribly low, and as you just
  noticed even less than learning a handful from scratch.
 
 I meant the original database, created by SA 3.3.2.. It got about 11k 
 tokens. Also, it runs MySQL 5.5.17 (as that machine runs ArchLinux) and 
 I'm not sure about the last comment on the MySQL bug page, it doesn't 
 really say if it's fixed or not in 5.5.16.

Your Ubuntu system uses 5.1, though.

Anyway, I guess to ever find out if this might be the issue, Mark or
someone else needs to come up with some funky idea.

And regardless, 11k tokens is terribly low.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: In subject how to detect a word in an EVAL string?

2011-11-21 Thread Karsten Bräckelmann
On Mon, 2011-11-21 at 17:49 -0600, Sergio wrote:
 Thank you Karsten for your input.
 
 I have modified the rule to the following and is working great:
 
 header   ADVERTISE_RULE8Subject =~ /publ.?.c.?.dad/i

I see you wildcarded both instances of 'i', with an additional, optional
second char each. However, you also dropped the space in publici dad
as per your original rule -- intended?

Doesn't have publicidad a more general meaning, too?

 If I see there are a lot of false positives I will modify it a bit,
 but for now it is what I was looking for.

Again, I strongly recommend to lower the score. And, of course to add a
\b word boundary at the beginning and end of the patter.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: In subject how to detect a word in an EVAL string?

2011-11-21 Thread Sergio
Spammers are using a lot of different ways of using the word publicidad,
I had a few different rules to block them, but since now I saw that there
was a character ¡ used an i and at the same time an i  followed by an
space.

So, I used the .?. and it catches the i and the space and just in case
the spamer tries to use publi ci dad it will be catched as well. In my
RegEx editor it passes the test.

About the word publicidad In my server not much people uses that word and
that is why I can block it.

Sergio

2011/11/21 Karsten Bräckelmann guent...@rudersport.de

 On Mon, 2011-11-21 at 17:49 -0600, Sergio wrote:
  Thank you Karsten for your input.
 
  I have modified the rule to the following and is working great:
 
  header   ADVERTISE_RULE8Subject =~ /publ.?.c.?.dad/i

 I see you wildcarded both instances of 'i', with an additional, optional
 second char each. However, you also dropped the space in publici dad
 as per your original rule -- intended?

 Doesn't have publicidad a more general meaning, too?

  If I see there are a lot of false positives I will modify it a bit,
  but for now it is what I was looking for.

 Again, I strongly recommend to lower the score. And, of course to add a
 \b word boundary at the beginning and end of the patter.


 --
 char *t=\10pse\0r\0dtu\0.@ghno
 \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
 main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8?
 c=1:
 (c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
 }}}