Re: Regular expression expanding

2005-01-28 Thread Loren Wilton





  I'm trying to get my head around regular _expression_ matching. 
  
  
  body MANGLED_CASH 
  /(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i
  My 
  understanding of rule matching was that the '(?!cash' bit required an | (or) 
  in order to work. Can anyone break down the logic of how SA tests this 
  line?
  
  Not sure why you think an OR is required. 
  OTOH, I'm not at all sure why there is a \b there between (?!cash) and the 
  mangled matching code. That \b either should be inside the parends with 
  cash, or shoudn't be there at all. Given the overall rule it would be 
  more efficient to have it inside the parends. There should also be 
  another \b before the '(?!' part to keep from matching 'cash' inside the 
  middle of some other word, I suppose.
  
  Then again, I don't really see a reason to have 
  the \b check there at all. If someone is going to spell cash using 
  mangled letters, I don't see that you care much if it is a stand-alone 
  word.
  
  In any case, what the (?!cash) part is saying 
  is 'the word 'cash' does not appear here', followed by a word break (the \b) 
  followed by a mangled spelling of cash, followed by another word break. 
  Which doesn't really work, but the intent was to catch a mangled spelling of 
  cash, but not a non-mangled spelling.
  
  A better version would probably be 
  
  
  
  body MANGLED_CASH 
  /(?!cash)[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h/i
   
   Loren
  


Re: Regular expression expanding

2005-01-28 Thread Robert Menschel
Hello Richard,

Thursday, January 27, 2005, 6:23:53 AM, you wrote:

GR I'm trying to get my head around regular expression matching. 
 
GR body MANGLED_CASH
GR /(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i

GR My understanding of rule matching was that the '(?!cash' bit
GR required an | (or) in order to work. Can anyone break down the
GR logic of how SA tests this line?

GR /(?!cash)
Do NOT match cash
GR \b
What ever does match needs to begin at the beginning of a word. There
must be a beginning of line or non-word character to the left, and a
word character to the right.
GR [cǩ\(]
First character matched must be a C or some variation thereof
GR [_\W]{0,5}
Next character(s) matched must be some non-alphanumeric character.
There may or may not be any, and no more than 5.
GR [EMAIL PROTECTED]
Next letter is an A
GR [_\W]{0,5}
GR [sz5\$]
Next letter is an S
GR [_\W]{0,5}
GR h
Next letter is an H
GR \b
That H has to be followed by a non-word character or end of line
GR /i
Ignore case -- treat CA$H the same as ca$h.

Bob Menschel





Re: Regular expression expanding

2005-01-28 Thread Matt Kettler
At 09:23 AM 1/27/2005, Gray, Richard wrote:
body 
MANGLED_CASH/(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i
My understanding of rule matching was that the '(?!cash' bit required an | 
(or) in order to work. Can anyone break down the logic of how SA tests 
this line?
Heh.. I think your used to seeing things like (?:a|b)  which is an or 
operation with backreferencing disabled.

However, you can also have (?:a) without the | and you can have (a|b).
The deal is that (?: disables the ability to later use backreferencing, 
which is the ability to use \1 later in a expression to require a duplicate 
of a previous match.

| is just an or.
Put the two together and you have an or without backreferencing. Disabling 
backreferencing saves memory if you're not going to use it, so it's 
commonly done in SA rules.

The bit used in the MANGLED_CASH rule is a completely different syntax, 
despite it's similar appearance. (?!a) is a negative look-ahead assertion. 
ie: when evaluating the rest of the regex line, do not match if you match 
this. Here it's used to exclude cash from being considered a match for 
the mangled string.

There's lots of different operation modifiers that start with (?.  (?: is 
much different than (?! , (?=, or (?!

This really is getting into advanced perl regex syntax, but if you really 
want to know about them look up:

http://perlmonks.thepen.com/236866.html
In the context of SA rules, you usually only see (?: and (?! 



RE: Regular expression expanding

2005-01-28 Thread Gray, Richard

Loren, Bob, Mike

Awesome explanations! Mike hit the nail on the head for the bit that I was 
uncertain about, but the explanations cleared up a lot of extra uncertainty 
surrounding the whole thing.

Thanks for your help,

Richard

-Original Message-
From: Matt Kettler [mailto:[EMAIL PROTECTED] 
Sent: 28 January 2005 02:51
To: Gray, Richard; users@spamassassin.apache.org
Subject: Re: Regular expression expanding

At 09:23 AM 1/27/2005, Gray, Richard wrote:
body
MANGLED_CASH/(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz
5\$][_\W]{0,5}h\b/i My understanding of rule matching was that the 
'(?!cash' bit required an |
(or) in order to work. Can anyone break down the logic of how SA tests 
this line?

Heh.. I think your used to seeing things like (?:a|b)  which is an or operation 
with backreferencing disabled.

However, you can also have (?:a) without the | and you can have (a|b).

The deal is that (?: disables the ability to later use backreferencing, which 
is the ability to use \1 later in a expression to require a duplicate of a 
previous match.

| is just an or.

Put the two together and you have an or without backreferencing. Disabling 
backreferencing saves memory if you're not going to use it, so it's commonly 
done in SA rules.

The bit used in the MANGLED_CASH rule is a completely different syntax, despite 
it's similar appearance. (?!a) is a negative look-ahead assertion. 
ie: when evaluating the rest of the regex line, do not match if you match this. 
Here it's used to exclude cash from being considered a match for the mangled 
string.

There's lots of different operation modifiers that start with (?.  (?: is much 
different than (?! , (?=, or (?!

This really is getting into advanced perl regex syntax, but if you really want 
to know about them look up:

http://perlmonks.thepen.com/236866.html

In the context of SA rules, you usually only see (?: and (?! 





---
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]






Regular expression expanding

2005-01-27 Thread Gray, Richard



I'm 
trying to get my head around regular _expression_ matching. 

body 
MANGLED_CASH	/(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i
My 
understanding of rule matching was that the '(?!cash' bit required an | (or) in 
order to work. Can anyone break down the logic of how SA tests this 
line?

Thanks,

Richard

---
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact [EMAIL PROTECTED]