Re: Very low score for spam from b2blistappenders.com

2016-04-08 Thread RW
On Fri, 08 Apr 2016 18:04:48 +0300
Jari Fredriksson wrote:

> Robert Boyl kirjoitti 8.4.2016 16:13:
> 
> > Hi, everyone
> > 
> > Pls, do you get a good spam score on this? For us, no hits for
> > spamassassin, etc.
> > 
> > I checked in test sites such as http://spamcheck.postmarkapp.com/
> > and also very low score.
> > 
> > Strange, as it does seem to have spammy words, etc... no? 
> > 
> > See:
> > 
> > http://pastebin.com/EJH1eddN  
> 
> The old plugin botnet still rocks on me, while most just can't and
> won't use it... My bayes was clueless, as expected. But not 00
> either.. 
>   ...
>  1.5 BOTNET Relay might be a spambot or virusbot 
> 
>
> [botnet0.8,ip=MTkyLjE2OC4xLjY2,maildomain=b2blistappenders.com,nordns] 


Unfortunately that's caused by Botnet picking up an incorrectly parsed
internal header. 


Re: Very low score for spam from b2blistappenders.com

2016-04-08 Thread RW
On Fri, 8 Apr 2016 10:13:45 -0300
Robert Boyl wrote:

> Hi, everyone
> 
> Pls, do you get a good spam score on this? For us, no hits for
> spamassassin, etc.
> 
> I checked in test sites such as http://spamcheck.postmarkapp.com/ and
> also very low score.
> 
> Strange, as it does seem to have spammy words, etc... no?
> 
> See:
> 
> http://pastebin.com/EJH1eddN
> 

In this header 

Received: from unknown (HELO mx25.myisp.com) (MTkyLjE2OC4xLjY2)
  by mx12.myisp.com with SMTP; 7 Apr 2016 18:14:25 -


Did you edit anything other than the myisp.com domain? In particular
the contents of the brackets that contain MTkyLjE2OC4xLjY2.

The parser is expecting something like this example: 
 
  Received: from customer254-217.iplannetworks.net (HELO AGAMENON)
 (baldusi@200.69.254.217 with plain) by smtp.mail.vip.sc5.yahoo.com
 with SMTP; 11 Mar 2003 21:03:28 -


Re: Regex in case of spaces

2016-04-08 Thread John Hardin

On Fri, 8 Apr 2016, Bowie Bailey wrote:


On 4/8/2016 11:09 AM, Reindl Harald wrote:



 Am 08.04.2016 um 17:05 schrieb John Hardin:
>  On Fri, 8 Apr 2016, Reindl Harald wrote:
> 
> >  /.*need to buy products.*\?.*/i
> > 
> >  .* = any chars independent how often
> 
>  Do NOT use ".*" in body or rawbody rules. That can lead to unbounded

>  processing times. Use a sane upper limit, e.g. ".{,20}", and try to
>  avoid repeated "." where possible

 thanks for the hint but that's not possible in case of "contains" rules
 where you don't know at which place the offeding phrase comes

 interesting that we have around 1100 such rules and the
 Spamassassin/ClamAV virtual machine runs most of the day between 50 and
 300 MHz



In this case, aren't the first and last ".*" redundant anyway?

/.*need to buy products.*\?.*/i

is functionally equivalent to

/need to buy products.*\?/i


That, too. .* at the end of an RE is totally pointless.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 5 days until Thomas Jefferson's 273rd Birthday


Re: Regex in case of spaces

2016-04-08 Thread Bowie Bailey

On 4/8/2016 11:09 AM, Reindl Harald wrote:



Am 08.04.2016 um 17:05 schrieb John Hardin:

On Fri, 8 Apr 2016, Reindl Harald wrote:


/.*need to buy products.*\?.*/i

.* = any chars independent how often


Do NOT use ".*" in body or rawbody rules. That can lead to unbounded
processing times. Use a sane upper limit, e.g. ".{,20}", and try to
avoid repeated "." where possible


thanks for the hint but that's not possible in case of "contains" 
rules where you don't know at which place the offeding phrase comes


interesting that we have around 1100 such rules and the 
Spamassassin/ClamAV virtual machine runs most of the day between 50 
and 300 MHz




In this case, aren't the first and last ".*" redundant anyway?

/.*need to buy products.*\?.*/i

is functionally equivalent to

/need to buy products.*\?/i

And since you don't need (or want) too much extra stuff before the 
question mark, you could easily limit it without losing functionality.


/need to buy products.{,20}\?/i

--
Bowie


Re: Regex in case of spaces

2016-04-08 Thread John Hardin

On Fri, 8 Apr 2016, Reindl Harald wrote:


Am 08.04.2016 um 17:05 schrieb John Hardin:

 On Fri, 8 Apr 2016, Reindl Harald wrote:

>  /.*need to buy products.*\?.*/i
> 
>  .* = any chars independent how often


 Do NOT use ".*" in body or rawbody rules. That can lead to unbounded
 processing times. Use a sane upper limit, e.g. ".{,20}", and try to
 avoid repeated "." where possible


thanks for the hint but that's not possible in case of "contains" rules where 
you don't know at which place the offeding phrase comes


Then the limit can be generous. And in the case of the above, you can 
avoid backtracking issues by doing this instead:


   products[^?]{,100}\?

...so that the character set you're skipping over doesn't contain the 
value you're looking for. Note that this does not work in all cases, but 
in this case it does avoid problems.


interesting that we have around 1100 such rules and the Spamassassin/ClamAV 
virtual machine runs most of the day between 50 and 300 MHz


I said "can", not "will". It depends on the RE and the data you give it.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 5 days until Thomas Jefferson's 273rd Birthday


Re: Regex in case of spaces

2016-04-08 Thread Reindl Harald



Am 08.04.2016 um 17:05 schrieb John Hardin:

On Fri, 8 Apr 2016, Reindl Harald wrote:


/.*need to buy products.*\?.*/i

.* = any chars independent how often


Do NOT use ".*" in body or rawbody rules. That can lead to unbounded
processing times. Use a sane upper limit, e.g. ".{,20}", and try to
avoid repeated "." where possible


thanks for the hint but that's not possible in case of "contains" rules 
where you don't know at which place the offeding phrase comes


interesting that we have around 1100 such rules and the 
Spamassassin/ClamAV virtual machine runs most of the day between 50 and 
300 MHz




signature.asc
Description: OpenPGP digital signature


Re: Regex in case of spaces

2016-04-08 Thread John Hardin

On Fri, 8 Apr 2016, Reindl Harald wrote:


/.*need to buy products.*\?.*/i

.* = any chars independent how often


Do NOT use ".*" in body or rawbody rules. That can lead to unbounded 
processing times. Use a sane upper limit, e.g. ".{,20}", and try to avoid 
repeated "." where possible.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 5 days until Thomas Jefferson's 273rd Birthday


Re: Very low score for spam from b2blistappenders.com

2016-04-08 Thread Jari Fredriksson
Robert Boyl kirjoitti 8.4.2016 16:13:

> Hi, everyone
> 
> Pls, do you get a good spam score on this? For us, no hits for spamassassin, 
> etc.
> 
> I checked in test sites such as http://spamcheck.postmarkapp.com/ and also 
> very low score.
> 
> Strange, as it does seem to have spammy words, etc... no? 
> 
> See:
> 
> http://pastebin.com/EJH1eddN

The old plugin botnet still rocks on me, while most just can't and won't
use it... My bayes was clueless, as expected. But not 00 either.. 

Content analysis details:   (5.4 points, 5.0 required) 

 pts rule name  description 

 --
-- 

 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net 

  [Blocked - see
] 

 1.5 BOTNET Relay might be a spambot or virusbot 

   
[botnet0.8,ip=MTkyLjE2OC4xLjY2,maildomain=b2blistappenders.com,nordns] 

 0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60% 

[score: 0.4918] 

 1.0 HTML_MESSAGE   BODY: HTML included in message 

 0.8 RDNS_NONE  Delivered to internal network by a host with
no rDNS 

-- 
jarif.bit 

Re: Very low score for spam from b2blistappenders.com

2016-04-08 Thread RW
On Fri, 8 Apr 2016 10:13:45 -0300
Robert Boyl wrote:

> Hi, everyone
> 
> Pls, do you get a good spam score on this? For us, no hits for
> spamassassin, etc.
> 
> I checked in test sites such as http://spamcheck.postmarkapp.com/ and
> also very low score.
> 
> Strange, as it does seem to have spammy words, etc... no?

SpamAssassin tends not to have many rules that target types of content
because they could be legitimate. Finding which words are spammy for
you is what Bayes is for.  


> See:
> 
> http://pastebin.com/EJH1eddN


There are three blocks of headers here DSPAM, X-myisp.com, and
Barracuda headers.  It's  not clear whether any are yours, but I see
that DSPAM did catch this and Barracuda doesn't have Bayes turned-on.


Re: Regex in case of spaces

2016-04-08 Thread RW
On Fri, 08 Apr 2016 14:43:07 +0100
Martin Gregorie wrote:

> On Fri, 2016-04-08 at 14:28 +0100, RW wrote:
> 
> > His rule failed solely because he scored it at zero.
> >  
> Since the OP claimed to be relatively clueless about regexes, I posted
> what I did in the hope of showing him a easy way to write and test
> them.

Right. That's why you wrote "Try this:" and left the  score at zero.




Re: Regex in case of spaces

2016-04-08 Thread Martin Gregorie
On Fri, 2016-04-08 at 14:28 +0100, RW wrote:

> His rule failed solely because he scored it at zero.
>
Since the OP claimed to be relatively clueless about regexes, I posted
what I did in the hope of showing him a easy way to write and test
them.

Regex simplification by writing body rules to take advantage of SA's
body text preprocessing can come later.


Martin



Re: Regex in case of spaces

2016-04-08 Thread RW
On Fri, 08 Apr 2016 14:19:45 +0100
Martin Gregorie wrote:

> On 2016-04-08 14:02, Robert Boyl wrote:
> > 
> > describe TEST123test
> > body TEST123/\bNeed to buy products *\?\b/i
> > scoreTEST123 0.0
> > 
> > If possible, also make it catch if more than 1 question mark :)
> > use \ in front of space char
> >   
> Try this:
> 
> describe TEST123test
> body TEST123/Need to buy products\s*\?+/i
> scoreTEST123 0.0
> 
> \s*  matches zero or more whitespace (spaces and TABs)

There are no tabs and no consecutive spaces. In the body there is no
difference between  " ?", " *", \s* and \s? 

His rule failed solely because he scored it at zero.


Re: Very low score for spam from b2blistappenders.com

2016-04-08 Thread Reindl Harald



Am 08.04.2016 um 15:13 schrieb Robert Boyl:

Hi, everyone

Pls, do you get a good spam score on this? For us, no hits for
spamassassin, etc.

I checked in test sites such as http://spamcheck.postmarkapp.com/ and
also very low score.

Strange, as it does seem to have spammy words, etc... no?

See:

http://pastebin.com/EJH1eddN


besdies that your ISP is a fool (URIBL_BLOCKED - just google it) even a 
high score don't help much with "TAG_LEVEL=3.5 QUARANTINE_LEVEL=400.0 
KILL_LEVEL=100.0 tests=HTML_MESSAGE"


why is your ISP's spamfilter at all in front?

that way you can't reject anything without harm your ISP by make it a 
backscatter - inbound filters have to run dfirectly on the MX to make 
rejects possible and let RBL's do their job proper


Content analysis details:   (6.0 points, 5.5 required)

 pts rule name  description
 -- 
--

 3.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
[score: 0.7551]
 0.0 HTML_MESSAGE   BODY: HTML included in message
 2.5 RDNS_NONE  Delivered to internal network by a host 
with no rDNS

___

after train it and add custom rules

Content analysis details:   (14.9 points, 5.5 required)

 pts rule name  description
 -- 
--

 7.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 2.5 CUST_BODY_18   BODY: Contains Medium
 1.5 CUST_BODY_17   BODY: Contains Low
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.4 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 2.5 RDNS_NONE  Delivered to internal network by a host 
with no rDNS

 0.5 CUST_SUBJ_16   Contains Very Low



signature.asc
Description: OpenPGP digital signature


Re: Regex in case of spaces

2016-04-08 Thread Martin Gregorie
On 2016-04-08 14:02, Robert Boyl wrote:
> 
> describe TEST123test
> body TEST123/\bNeed to buy products *\?\b/i
> scoreTEST123 0.0
> 
> If possible, also make it catch if more than 1 question mark :)
> use \ in front of space char
> 
Try this:

describe TEST123test
body TEST123/Need to buy products\s*\?+/i
scoreTEST123 0.0

\s*  matches zero or more whitespace (spaces and TABs)
\?+ matches one or more question marks

You can use 'grep -P' or any of the online regex testers to try out a
regex before writing it into a rule. The -P option tells grep to use
Perl regular expressions, which may differ from standard grep ones. I
checked the regex I showed above this way.

Input was this set of seven lines in a file called need.txt:

Need to buy products? This and the next four lines should match.
Need to buy products ?
A need to buy products  ?
Need to buy products ??
Do you need to buy products ? We have some.
Need to buy products - doesn't match
Want to but products - doesn't match

The test command, where the options -P tells grep to use a Perl regex
and -i means caseless matching, was:

grep -Pi 'Need to buy products\s*\?+' need.txt

where the regex is exactly equivalent to:

/Need to buy products\s*\?+/i

and the output was five matched lines:

Need to buy products? This and the next four lines should match.
Need to buy products ?
A need to buy products  ?
Need to buy products     ??
Do you need to buy products ? We have some. 

When in doubt about writing a regex, this is my usual way of rapidly
finding a valid expression the checking that it matches what I expect.
On my system grep defaults to highlighting the matched text in red.

My standard reference for writing Perl regexes is the O'Reilly 'Camel
book', "Programming Perl" chapter 5. 

 
Martin



Very low score for spam from b2blistappenders.com

2016-04-08 Thread Robert Boyl
Hi, everyone

Pls, do you get a good spam score on this? For us, no hits for
spamassassin, etc.

I checked in test sites such as http://spamcheck.postmarkapp.com/ and also
very low score.

Strange, as it does seem to have spammy words, etc... no?

See:

http://pastebin.com/EJH1eddN

Thanks!
Robert


RE: Regex in case of spaces

2016-04-08 Thread Tony Abrahams
Hi dude,

Try the following - 

need to buy (a\s)?products?([\s?]{0,})?

added a conditional a with a space "(a\s)?"

added a conditional s after product s?

added a conditional combination of space and question marks ([\s?]{0,})?

Hope it helps

Rgds
Tony

> Subject: Re: Regex in case of spaces
> To: users@spamassassin.apache.org
> From: h.rei...@thelounge.net
> Date: Fri, 8 Apr 2016 14:37:58 +0200
> 
> 
> 
> Am 08.04.2016 um 14:02 schrieb Robert Boyl:
> > Hi, everyone!
> >
> > Sorry, lame with regex.
> >
> > How can I make a rule to catch:
> >
> > Need to buy a product ?
> >
> > And also catch "need to buy a product?"
> >
> > Note the extra spacing.
> >
> > Tried this, didnt work:
> >
> > describe TEST123test
> > body TEST123/\bNeed to buy products *\?\b/i
> > scoreTEST123 0.0
> >
> > If possible, also make it catch if more than 1 question mark :)
> 
> /.*need to buy products.*\?.*/i
> 
> .* = any chars independent how often
> so it's basically "anything which contains the text followed by a ?
> 
  

Re: Regex in case of spaces

2016-04-08 Thread RW
On Fri, 8 Apr 2016 09:02:36 -0300
Robert Boyl wrote:

> Hi, everyone!
> 
> Sorry, lame with regex.
> 
> How can I make a rule to catch:
> 
> Need to buy a product ?
> 
> And also catch "need to buy a product?"
> 
> Note the extra spacing.

The body is normalized, so all consecutive whitespace becomes a single
space.

If you need to detect multiple spaces you need the rawbody. 

> Tried this, didnt work:
> 
> describe TEST123test
> body TEST123/\bNeed to buy products *\?\b/i
> scoreTEST123 0.0

Scoring a rule at zero stops it being used, which is why it failed.

There's no boundary between punctuation and a space, so
replace the final \b with a  $ (for clarity) or leave it out. Usually
it's better not to wrap phrases like this in a pair of boundaries.
There's no chance of extra letters changing the meaning - it just
makes it a little easier to beat.


> If possible, also make it catch if more than 1 question mark :)

Without the final \b it will match. Otherwise use \?+.


Re: Regex in case of spaces

2016-04-08 Thread Reindl Harald



Am 08.04.2016 um 14:02 schrieb Robert Boyl:

Hi, everyone!

Sorry, lame with regex.

How can I make a rule to catch:

Need to buy a product ?

And also catch "need to buy a product?"

Note the extra spacing.

Tried this, didnt work:

describe TEST123test
body TEST123/\bNeed to buy products *\?\b/i
scoreTEST123 0.0

If possible, also make it catch if more than 1 question mark :)


/.*need to buy products.*\?.*/i

.* = any chars independent how often
so it's basically "anything which contains the text followed by a ?



signature.asc
Description: OpenPGP digital signature


Re: Regex in case of spaces

2016-04-08 Thread me

On 2016-04-08 14:02, Robert Boyl wrote:


describe TEST123test
body TEST123/\bNeed to buy products *\?\b/i
scoreTEST123 0.0

If possible, also make it catch if more than 1 question mark :)


use \ in front of space char

and score with 0 disable the test, waste of rule :)


Regex in case of spaces

2016-04-08 Thread Robert Boyl
Hi, everyone!

Sorry, lame with regex.

How can I make a rule to catch:

Need to buy a product ?

And also catch "need to buy a product?"

Note the extra spacing.

Tried this, didnt work:

describe TEST123test
body TEST123/\bNeed to buy products *\?\b/i
scoreTEST123 0.0

If possible, also make it catch if more than 1 question mark :)

Thanks!
Robert