RE: Regular expression expanding
Loren, Bob, Mike Awesome explanations! Mike hit the nail on the head for the bit that I was uncertain about, but the explanations cleared up a lot of extra uncertainty surrounding the whole thing. Thanks for your help, Richard -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: 28 January 2005 02:51 To: Gray, Richard; users@spamassassin.apache.org Subject: Re: Regular expression expanding At 09:23 AM 1/27/2005, Gray, Richard wrote: >body >MANGLED_CASH/(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz >5\$][_\W]{0,5}h\b/i My understanding of rule matching was that the >'(?!cash' bit required an | >(or) in order to work. Can anyone break down the logic of how SA tests >this line? Heh.. I think your used to seeing things like (?:a|b) which is an or operation with backreferencing disabled. However, you can also have (?:a) without the | and you can have (a|b). The deal is that (?: disables the ability to later use backreferencing, which is the ability to use \1 later in a expression to require a duplicate of a previous match. | is just an or. Put the two together and you have an or without backreferencing. Disabling backreferencing saves memory if you're not going to use it, so it's commonly done in SA rules. The bit used in the MANGLED_CASH rule is a completely different syntax, despite it's similar appearance. (?!a) is a negative look-ahead assertion. ie: when evaluating the rest of the regex line, do not match if you match this. Here it's used to exclude "cash" from being considered a match for the mangled string. There's lots of different operation modifiers that start with (?. (?: is much different than (?! , (?=, or (?http://perlmonks.thepen.com/236866.html In the context of SA rules, you usually only see (?: and (?! --- This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses. For further information contact [EMAIL PROTECTED]
Re: Regular expression expanding
At 09:23 AM 1/27/2005, Gray, Richard wrote: body MANGLED_CASH/(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i My understanding of rule matching was that the '(?!cash' bit required an | (or) in order to work. Can anyone break down the logic of how SA tests this line? Heh.. I think your used to seeing things like (?:a|b) which is an or operation with backreferencing disabled. However, you can also have (?:a) without the | and you can have (a|b). The deal is that (?: disables the ability to later use backreferencing, which is the ability to use \1 later in a expression to require a duplicate of a previous match. | is just an or. Put the two together and you have an or without backreferencing. Disabling backreferencing saves memory if you're not going to use it, so it's commonly done in SA rules. The bit used in the MANGLED_CASH rule is a completely different syntax, despite it's similar appearance. (?!a) is a negative look-ahead assertion. ie: when evaluating the rest of the regex line, do not match if you match this. Here it's used to exclude "cash" from being considered a match for the mangled string. There's lots of different operation modifiers that start with (?. (?: is much different than (?! , (?=, or (? This really is getting into advanced perl regex syntax, but if you really want to know about them look up: http://perlmonks.thepen.com/236866.html In the context of SA rules, you usually only see (?: and (?!
Re: Regular expression expanding
Hello Richard, Thursday, January 27, 2005, 6:23:53 AM, you wrote: GR> I'm trying to get my head around regular expression matching. GR> body MANGLED_CASH GR> /(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i GR> My understanding of rule matching was that the '(?!cash' bit GR> required an | (or) in order to work. Can anyone break down the GR> logic of how SA tests this line? GR> /(?!cash) Do NOT match "cash" GR> \b What ever does match needs to begin at the beginning of a word. There must be a beginning of line or non-word character to the left, and a word character to the right. GR> [cǩ\(] First character matched must be a C or some variation thereof GR> [_\W]{0,5} Next character(s) matched must be some non-alphanumeric character. There may or may not be any, and no more than 5. GR> [EMAIL PROTECTED] Next letter is an A GR> [_\W]{0,5} GR> [sz5\$] Next letter is an S GR> [_\W]{0,5} GR> h Next letter is an H GR> \b That H has to be followed by a non-word character or end of line GR> /i Ignore case -- treat CA$H the same as ca$h. Bob Menschel
Re: Regular expression expanding
I'm trying to get my head around regular _expression_ matching. body MANGLED_CASH /(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i My understanding of rule matching was that the '(?!cash' bit required an | (or) in order to work. Can anyone break down the logic of how SA tests this line? Not sure why you think an OR is required. OTOH, I'm not at all sure why there is a \b there between (?!cash) and the mangled matching code. That \b either should be inside the parends with cash, or shoudn't be there at all. Given the overall rule it would be more efficient to have it inside the parends. There should also be another \b before the '(?!' part to keep from matching 'cash' inside the middle of some other word, I suppose. Then again, I don't really see a reason to have the \b check there at all. If someone is going to spell cash using mangled letters, I don't see that you care much if it is a stand-alone word. In any case, what the (?!cash) part is saying is 'the word 'cash' does not appear here', followed by a word break (the \b) followed by a mangled spelling of cash, followed by another word break. Which doesn't really work, but the intent was to catch a mangled spelling of cash, but not a non-mangled spelling. A better version would probably be body MANGLED_CASH /(?!cash)[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h/i Loren
Regular expression expanding
I'm trying to get my head around regular _expression_ matching. body MANGLED_CASH /(?!cash)\b[cǩ\(][_\W]{0,[EMAIL PROTECTED],5}[sz5\$][_\W]{0,5}h\b/i My understanding of rule matching was that the '(?!cash' bit required an | (or) in order to work. Can anyone break down the logic of how SA tests this line? Thanks, Richard --- This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses. For further information contact [EMAIL PROTECTED]