Re: Regex in case of spaces

2016-04-08 Thread John Hardin

On Fri, 8 Apr 2016, Bowie Bailey wrote:


On 4/8/2016 11:09 AM, Reindl Harald wrote:



 Am 08.04.2016 um 17:05 schrieb John Hardin:
>  On Fri, 8 Apr 2016, Reindl Harald wrote:
> 
> >  /.*need to buy products.*\?.*/i
> > 
> >  .* = any chars independent how often
> 
>  Do NOT use ".*" in body or rawbody rules. That can lead to unbounded

>  processing times. Use a sane upper limit, e.g. ".{,20}", and try to
>  avoid repeated "." where possible

 thanks for the hint but that's not possible in case of "contains" rules
 where you don't know at which place the offeding phrase comes

 interesting that we have around 1100 such rules and the
 Spamassassin/ClamAV virtual machine runs most of the day between 50 and
 300 MHz



In this case, aren't the first and last ".*" redundant anyway?

/.*need to buy products.*\?.*/i

is functionally equivalent to

/need to buy products.*\?/i


That, too. .* at the end of an RE is totally pointless.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 5 days until Thomas Jefferson's 273rd Birthday


Re: Regex in case of spaces

2016-04-08 Thread Bowie Bailey

On 4/8/2016 11:09 AM, Reindl Harald wrote:



Am 08.04.2016 um 17:05 schrieb John Hardin:

On Fri, 8 Apr 2016, Reindl Harald wrote:


/.*need to buy products.*\?.*/i

.* = any chars independent how often


Do NOT use ".*" in body or rawbody rules. That can lead to unbounded
processing times. Use a sane upper limit, e.g. ".{,20}", and try to
avoid repeated "." where possible


thanks for the hint but that's not possible in case of "contains" 
rules where you don't know at which place the offeding phrase comes


interesting that we have around 1100 such rules and the 
Spamassassin/ClamAV virtual machine runs most of the day between 50 
and 300 MHz




In this case, aren't the first and last ".*" redundant anyway?

/.*need to buy products.*\?.*/i

is functionally equivalent to

/need to buy products.*\?/i

And since you don't need (or want) too much extra stuff before the 
question mark, you could easily limit it without losing functionality.


/need to buy products.{,20}\?/i

--
Bowie


Re: Regex in case of spaces

2016-04-08 Thread John Hardin

On Fri, 8 Apr 2016, Reindl Harald wrote:


Am 08.04.2016 um 17:05 schrieb John Hardin:

 On Fri, 8 Apr 2016, Reindl Harald wrote:

>  /.*need to buy products.*\?.*/i
> 
>  .* = any chars independent how often


 Do NOT use ".*" in body or rawbody rules. That can lead to unbounded
 processing times. Use a sane upper limit, e.g. ".{,20}", and try to
 avoid repeated "." where possible


thanks for the hint but that's not possible in case of "contains" rules where 
you don't know at which place the offeding phrase comes


Then the limit can be generous. And in the case of the above, you can 
avoid backtracking issues by doing this instead:


   products[^?]{,100}\?

...so that the character set you're skipping over doesn't contain the 
value you're looking for. Note that this does not work in all cases, but 
in this case it does avoid problems.


interesting that we have around 1100 such rules and the Spamassassin/ClamAV 
virtual machine runs most of the day between 50 and 300 MHz


I said "can", not "will". It depends on the RE and the data you give it.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 5 days until Thomas Jefferson's 273rd Birthday


Re: Regex in case of spaces

2016-04-08 Thread Reindl Harald



Am 08.04.2016 um 17:05 schrieb John Hardin:

On Fri, 8 Apr 2016, Reindl Harald wrote:


/.*need to buy products.*\?.*/i

.* = any chars independent how often


Do NOT use ".*" in body or rawbody rules. That can lead to unbounded
processing times. Use a sane upper limit, e.g. ".{,20}", and try to
avoid repeated "." where possible


thanks for the hint but that's not possible in case of "contains" rules 
where you don't know at which place the offeding phrase comes


interesting that we have around 1100 such rules and the 
Spamassassin/ClamAV virtual machine runs most of the day between 50 and 
300 MHz




signature.asc
Description: OpenPGP digital signature


Re: Regex in case of spaces

2016-04-08 Thread John Hardin

On Fri, 8 Apr 2016, Reindl Harald wrote:


/.*need to buy products.*\?.*/i

.* = any chars independent how often


Do NOT use ".*" in body or rawbody rules. That can lead to unbounded 
processing times. Use a sane upper limit, e.g. ".{,20}", and try to avoid 
repeated "." where possible.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 5 days until Thomas Jefferson's 273rd Birthday


Re: Regex in case of spaces

2016-04-08 Thread RW
On Fri, 08 Apr 2016 14:43:07 +0100
Martin Gregorie wrote:

> On Fri, 2016-04-08 at 14:28 +0100, RW wrote:
> 
> > His rule failed solely because he scored it at zero.
> >  
> Since the OP claimed to be relatively clueless about regexes, I posted
> what I did in the hope of showing him a easy way to write and test
> them.

Right. That's why you wrote "Try this:" and left the  score at zero.




Re: Regex in case of spaces

2016-04-08 Thread Martin Gregorie
On Fri, 2016-04-08 at 14:28 +0100, RW wrote:

> His rule failed solely because he scored it at zero.
>
Since the OP claimed to be relatively clueless about regexes, I posted
what I did in the hope of showing him a easy way to write and test
them.

Regex simplification by writing body rules to take advantage of SA's
body text preprocessing can come later.


Martin



Re: Regex in case of spaces

2016-04-08 Thread RW
On Fri, 08 Apr 2016 14:19:45 +0100
Martin Gregorie wrote:

> On 2016-04-08 14:02, Robert Boyl wrote:
> > 
> > describe TEST123test
> > body TEST123/\bNeed to buy products *\?\b/i
> > scoreTEST123 0.0
> > 
> > If possible, also make it catch if more than 1 question mark :)
> > use \ in front of space char
> >   
> Try this:
> 
> describe TEST123test
> body TEST123/Need to buy products\s*\?+/i
> scoreTEST123 0.0
> 
> \s*  matches zero or more whitespace (spaces and TABs)

There are no tabs and no consecutive spaces. In the body there is no
difference between  " ?", " *", \s* and \s? 

His rule failed solely because he scored it at zero.


Re: Regex in case of spaces

2016-04-08 Thread Martin Gregorie
On 2016-04-08 14:02, Robert Boyl wrote:
> 
> describe TEST123test
> body TEST123/\bNeed to buy products *\?\b/i
> scoreTEST123 0.0
> 
> If possible, also make it catch if more than 1 question mark :)
> use \ in front of space char
> 
Try this:

describe TEST123test
body TEST123/Need to buy products\s*\?+/i
scoreTEST123 0.0

\s*  matches zero or more whitespace (spaces and TABs)
\?+ matches one or more question marks

You can use 'grep -P' or any of the online regex testers to try out a
regex before writing it into a rule. The -P option tells grep to use
Perl regular expressions, which may differ from standard grep ones. I
checked the regex I showed above this way.

Input was this set of seven lines in a file called need.txt:

Need to buy products? This and the next four lines should match.
Need to buy products ?
A need to buy products  ?
Need to buy products ??
Do you need to buy products ? We have some.
Need to buy products - doesn't match
Want to but products - doesn't match

The test command, where the options -P tells grep to use a Perl regex
and -i means caseless matching, was:

grep -Pi 'Need to buy products\s*\?+' need.txt

where the regex is exactly equivalent to:

/Need to buy products\s*\?+/i

and the output was five matched lines:

Need to buy products? This and the next four lines should match.
Need to buy products ?
A need to buy products  ?
Need to buy products     ??
Do you need to buy products ? We have some. 

When in doubt about writing a regex, this is my usual way of rapidly
finding a valid expression the checking that it matches what I expect.
On my system grep defaults to highlighting the matched text in red.

My standard reference for writing Perl regexes is the O'Reilly 'Camel
book', "Programming Perl" chapter 5. 

 
Martin



RE: Regex in case of spaces

2016-04-08 Thread Tony Abrahams
Hi dude,

Try the following - 

need to buy (a\s)?products?([\s?]{0,})?

added a conditional a with a space "(a\s)?"

added a conditional s after product s?

added a conditional combination of space and question marks ([\s?]{0,})?

Hope it helps

Rgds
Tony

> Subject: Re: Regex in case of spaces
> To: users@spamassassin.apache.org
> From: h.rei...@thelounge.net
> Date: Fri, 8 Apr 2016 14:37:58 +0200
> 
> 
> 
> Am 08.04.2016 um 14:02 schrieb Robert Boyl:
> > Hi, everyone!
> >
> > Sorry, lame with regex.
> >
> > How can I make a rule to catch:
> >
> > Need to buy a product ?
> >
> > And also catch "need to buy a product?"
> >
> > Note the extra spacing.
> >
> > Tried this, didnt work:
> >
> > describe TEST123test
> > body TEST123/\bNeed to buy products *\?\b/i
> > scoreTEST123 0.0
> >
> > If possible, also make it catch if more than 1 question mark :)
> 
> /.*need to buy products.*\?.*/i
> 
> .* = any chars independent how often
> so it's basically "anything which contains the text followed by a ?
> 
  

Re: Regex in case of spaces

2016-04-08 Thread RW
On Fri, 8 Apr 2016 09:02:36 -0300
Robert Boyl wrote:

> Hi, everyone!
> 
> Sorry, lame with regex.
> 
> How can I make a rule to catch:
> 
> Need to buy a product ?
> 
> And also catch "need to buy a product?"
> 
> Note the extra spacing.

The body is normalized, so all consecutive whitespace becomes a single
space.

If you need to detect multiple spaces you need the rawbody. 

> Tried this, didnt work:
> 
> describe TEST123test
> body TEST123/\bNeed to buy products *\?\b/i
> scoreTEST123 0.0

Scoring a rule at zero stops it being used, which is why it failed.

There's no boundary between punctuation and a space, so
replace the final \b with a  $ (for clarity) or leave it out. Usually
it's better not to wrap phrases like this in a pair of boundaries.
There's no chance of extra letters changing the meaning - it just
makes it a little easier to beat.


> If possible, also make it catch if more than 1 question mark :)

Without the final \b it will match. Otherwise use \?+.


Re: Regex in case of spaces

2016-04-08 Thread Reindl Harald



Am 08.04.2016 um 14:02 schrieb Robert Boyl:

Hi, everyone!

Sorry, lame with regex.

How can I make a rule to catch:

Need to buy a product ?

And also catch "need to buy a product?"

Note the extra spacing.

Tried this, didnt work:

describe TEST123test
body TEST123/\bNeed to buy products *\?\b/i
scoreTEST123 0.0

If possible, also make it catch if more than 1 question mark :)


/.*need to buy products.*\?.*/i

.* = any chars independent how often
so it's basically "anything which contains the text followed by a ?



signature.asc
Description: OpenPGP digital signature


Re: Regex in case of spaces

2016-04-08 Thread me

On 2016-04-08 14:02, Robert Boyl wrote:


describe TEST123test
body TEST123/\bNeed to buy products *\?\b/i
scoreTEST123 0.0

If possible, also make it catch if more than 1 question mark :)


use \ in front of space char

and score with 0 disable the test, waste of rule :)


Regex in case of spaces

2016-04-08 Thread Robert Boyl
Hi, everyone!

Sorry, lame with regex.

How can I make a rule to catch:

Need to buy a product ?

And also catch "need to buy a product?"

Note the extra spacing.

Tried this, didnt work:

describe TEST123test
body TEST123/\bNeed to buy products *\?\b/i
scoreTEST123 0.0

If possible, also make it catch if more than 1 question mark :)

Thanks!
Robert