Re: Regex in case of spaces
On Fri, 8 Apr 2016, Bowie Bailey wrote: On 4/8/2016 11:09 AM, Reindl Harald wrote: Am 08.04.2016 um 17:05 schrieb John Hardin: > On Fri, 8 Apr 2016, Reindl Harald wrote: > > > /.*need to buy products.*\?.*/i > > > > .* = any chars independent how often > > Do NOT use ".*" in body or rawbody rules. That can lead to unbounded > processing times. Use a sane upper limit, e.g. ".{,20}", and try to > avoid repeated "." where possible thanks for the hint but that's not possible in case of "contains" rules where you don't know at which place the offeding phrase comes interesting that we have around 1100 such rules and the Spamassassin/ClamAV virtual machine runs most of the day between 50 and 300 MHz In this case, aren't the first and last ".*" redundant anyway? /.*need to buy products.*\?.*/i is functionally equivalent to /need to buy products.*\?/i That, too. .* at the end of an RE is totally pointless. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The Tea Party wants to remove the Crony from Crony Capitalism. OWS wants to remove Capitalism from Crony Capitalism. -- Astaghfirullah --- 5 days until Thomas Jefferson's 273rd Birthday
Re: Regex in case of spaces
On 4/8/2016 11:09 AM, Reindl Harald wrote: Am 08.04.2016 um 17:05 schrieb John Hardin: On Fri, 8 Apr 2016, Reindl Harald wrote: /.*need to buy products.*\?.*/i .* = any chars independent how often Do NOT use ".*" in body or rawbody rules. That can lead to unbounded processing times. Use a sane upper limit, e.g. ".{,20}", and try to avoid repeated "." where possible thanks for the hint but that's not possible in case of "contains" rules where you don't know at which place the offeding phrase comes interesting that we have around 1100 such rules and the Spamassassin/ClamAV virtual machine runs most of the day between 50 and 300 MHz In this case, aren't the first and last ".*" redundant anyway? /.*need to buy products.*\?.*/i is functionally equivalent to /need to buy products.*\?/i And since you don't need (or want) too much extra stuff before the question mark, you could easily limit it without losing functionality. /need to buy products.{,20}\?/i -- Bowie
Re: Regex in case of spaces
On Fri, 8 Apr 2016, Reindl Harald wrote: Am 08.04.2016 um 17:05 schrieb John Hardin: On Fri, 8 Apr 2016, Reindl Harald wrote: > /.*need to buy products.*\?.*/i > > .* = any chars independent how often Do NOT use ".*" in body or rawbody rules. That can lead to unbounded processing times. Use a sane upper limit, e.g. ".{,20}", and try to avoid repeated "." where possible thanks for the hint but that's not possible in case of "contains" rules where you don't know at which place the offeding phrase comes Then the limit can be generous. And in the case of the above, you can avoid backtracking issues by doing this instead: products[^?]{,100}\? ...so that the character set you're skipping over doesn't contain the value you're looking for. Note that this does not work in all cases, but in this case it does avoid problems. interesting that we have around 1100 such rules and the Spamassassin/ClamAV virtual machine runs most of the day between 50 and 300 MHz I said "can", not "will". It depends on the RE and the data you give it. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The Tea Party wants to remove the Crony from Crony Capitalism. OWS wants to remove Capitalism from Crony Capitalism. -- Astaghfirullah --- 5 days until Thomas Jefferson's 273rd Birthday
Re: Regex in case of spaces
Am 08.04.2016 um 17:05 schrieb John Hardin: On Fri, 8 Apr 2016, Reindl Harald wrote: /.*need to buy products.*\?.*/i .* = any chars independent how often Do NOT use ".*" in body or rawbody rules. That can lead to unbounded processing times. Use a sane upper limit, e.g. ".{,20}", and try to avoid repeated "." where possible thanks for the hint but that's not possible in case of "contains" rules where you don't know at which place the offeding phrase comes interesting that we have around 1100 such rules and the Spamassassin/ClamAV virtual machine runs most of the day between 50 and 300 MHz signature.asc Description: OpenPGP digital signature
Re: Regex in case of spaces
On Fri, 8 Apr 2016, Reindl Harald wrote: /.*need to buy products.*\?.*/i .* = any chars independent how often Do NOT use ".*" in body or rawbody rules. That can lead to unbounded processing times. Use a sane upper limit, e.g. ".{,20}", and try to avoid repeated "." where possible. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The Tea Party wants to remove the Crony from Crony Capitalism. OWS wants to remove Capitalism from Crony Capitalism. -- Astaghfirullah --- 5 days until Thomas Jefferson's 273rd Birthday
Re: Regex in case of spaces
On Fri, 08 Apr 2016 14:43:07 +0100 Martin Gregorie wrote: > On Fri, 2016-04-08 at 14:28 +0100, RW wrote: > > > His rule failed solely because he scored it at zero. > > > Since the OP claimed to be relatively clueless about regexes, I posted > what I did in the hope of showing him a easy way to write and test > them. Right. That's why you wrote "Try this:" and left the score at zero.
Re: Regex in case of spaces
On Fri, 2016-04-08 at 14:28 +0100, RW wrote: > His rule failed solely because he scored it at zero. > Since the OP claimed to be relatively clueless about regexes, I posted what I did in the hope of showing him a easy way to write and test them. Regex simplification by writing body rules to take advantage of SA's body text preprocessing can come later. Martin
Re: Regex in case of spaces
On Fri, 08 Apr 2016 14:19:45 +0100 Martin Gregorie wrote: > On 2016-04-08 14:02, Robert Boyl wrote: > > > > describe TEST123test > > body TEST123/\bNeed to buy products *\?\b/i > > scoreTEST123 0.0 > > > > If possible, also make it catch if more than 1 question mark :) > > use \ in front of space char > > > Try this: > > describe TEST123test > body TEST123/Need to buy products\s*\?+/i > scoreTEST123 0.0 > > \s* matches zero or more whitespace (spaces and TABs) There are no tabs and no consecutive spaces. In the body there is no difference between " ?", " *", \s* and \s? His rule failed solely because he scored it at zero.
Re: Regex in case of spaces
On 2016-04-08 14:02, Robert Boyl wrote: > > describe TEST123test > body TEST123/\bNeed to buy products *\?\b/i > scoreTEST123 0.0 > > If possible, also make it catch if more than 1 question mark :) > use \ in front of space char > Try this: describe TEST123test body TEST123/Need to buy products\s*\?+/i scoreTEST123 0.0 \s* matches zero or more whitespace (spaces and TABs) \?+ matches one or more question marks You can use 'grep -P' or any of the online regex testers to try out a regex before writing it into a rule. The -P option tells grep to use Perl regular expressions, which may differ from standard grep ones. I checked the regex I showed above this way. Input was this set of seven lines in a file called need.txt: Need to buy products? This and the next four lines should match. Need to buy products ? A need to buy products ? Need to buy products ?? Do you need to buy products ? We have some. Need to buy products - doesn't match Want to but products - doesn't match The test command, where the options -P tells grep to use a Perl regex and -i means caseless matching, was: grep -Pi 'Need to buy products\s*\?+' need.txt where the regex is exactly equivalent to: /Need to buy products\s*\?+/i and the output was five matched lines: Need to buy products? This and the next four lines should match. Need to buy products ? A need to buy products ? Need to buy products ?? Do you need to buy products ? We have some. When in doubt about writing a regex, this is my usual way of rapidly finding a valid expression the checking that it matches what I expect. On my system grep defaults to highlighting the matched text in red. My standard reference for writing Perl regexes is the O'Reilly 'Camel book', "Programming Perl" chapter 5. Martin
RE: Regex in case of spaces
Hi dude, Try the following - need to buy (a\s)?products?([\s?]{0,})? added a conditional a with a space "(a\s)?" added a conditional s after product s? added a conditional combination of space and question marks ([\s?]{0,})? Hope it helps Rgds Tony > Subject: Re: Regex in case of spaces > To: users@spamassassin.apache.org > From: h.rei...@thelounge.net > Date: Fri, 8 Apr 2016 14:37:58 +0200 > > > > Am 08.04.2016 um 14:02 schrieb Robert Boyl: > > Hi, everyone! > > > > Sorry, lame with regex. > > > > How can I make a rule to catch: > > > > Need to buy a product ? > > > > And also catch "need to buy a product?" > > > > Note the extra spacing. > > > > Tried this, didnt work: > > > > describe TEST123test > > body TEST123/\bNeed to buy products *\?\b/i > > scoreTEST123 0.0 > > > > If possible, also make it catch if more than 1 question mark :) > > /.*need to buy products.*\?.*/i > > .* = any chars independent how often > so it's basically "anything which contains the text followed by a ? >
Re: Regex in case of spaces
On Fri, 8 Apr 2016 09:02:36 -0300 Robert Boyl wrote: > Hi, everyone! > > Sorry, lame with regex. > > How can I make a rule to catch: > > Need to buy a product ? > > And also catch "need to buy a product?" > > Note the extra spacing. The body is normalized, so all consecutive whitespace becomes a single space. If you need to detect multiple spaces you need the rawbody. > Tried this, didnt work: > > describe TEST123test > body TEST123/\bNeed to buy products *\?\b/i > scoreTEST123 0.0 Scoring a rule at zero stops it being used, which is why it failed. There's no boundary between punctuation and a space, so replace the final \b with a $ (for clarity) or leave it out. Usually it's better not to wrap phrases like this in a pair of boundaries. There's no chance of extra letters changing the meaning - it just makes it a little easier to beat. > If possible, also make it catch if more than 1 question mark :) Without the final \b it will match. Otherwise use \?+.
Re: Regex in case of spaces
Am 08.04.2016 um 14:02 schrieb Robert Boyl: Hi, everyone! Sorry, lame with regex. How can I make a rule to catch: Need to buy a product ? And also catch "need to buy a product?" Note the extra spacing. Tried this, didnt work: describe TEST123test body TEST123/\bNeed to buy products *\?\b/i scoreTEST123 0.0 If possible, also make it catch if more than 1 question mark :) /.*need to buy products.*\?.*/i .* = any chars independent how often so it's basically "anything which contains the text followed by a ? signature.asc Description: OpenPGP digital signature
Re: Regex in case of spaces
On 2016-04-08 14:02, Robert Boyl wrote: describe TEST123test body TEST123/\bNeed to buy products *\?\b/i scoreTEST123 0.0 If possible, also make it catch if more than 1 question mark :) use \ in front of space char and score with 0 disable the test, waste of rule :)
Regex in case of spaces
Hi, everyone! Sorry, lame with regex. How can I make a rule to catch: Need to buy a product ? And also catch "need to buy a product?" Note the extra spacing. Tried this, didnt work: describe TEST123test body TEST123/\bNeed to buy products *\?\b/i scoreTEST123 0.0 If possible, also make it catch if more than 1 question mark :) Thanks! Robert