Re: Calling Regex Experts

2006-08-24 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



D.J. wrote:
 OK, I'm stumped.  I need to create a regex that will match if anything
 other than two terms I've specified exist.
 
 So for example, I have two terms I like, say cat and dog.  I want
 the rule to match if a string contains anything other than cat or dog.
 
 I tried ...
 
 $value !~ /cat|dog/
 
 ...but this had the unintended consequence of still matching a string
 like cat dog bird or cat bird since the string does contain one of
 my two terms.  So what do I need to do?  Thanks in advance!
 
 - D.J.


D.J.,

 you're probably best off using META rules for this.  So you could have
something like (completely untested and off the top of my head in the
middle of the night):

body __CAT  /cat/
body __DOG  /dog/

meta NOT_CAT_AND_DOG(!__CAT  !__DOG)

you should definitely check the man pages and/or wiki about writing
rules to do this properly, but that should get you started.

Alan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE7dfoE2gsBSKjZHQRAozpAKC+edJGc52qWz1qguOQReCLUy3z9ACgzFpn
V20guvwnlLaKHy3Aiy8FLQs=
=eGwC
-END PGP SIGNATURE-


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
 OK, I'm stumped.  I need to create a regex that will match if
 anything other than two terms I've specified exist. 
 
 So for example, I have two terms I like, say cat and dog.  I want
 the rule to match if a string contains anything other than cat or
 dog.  
 
 I tried ...
 
 $value !~ /cat|dog/
 
 ...but this had the unintended consequence of still matching a string
 like cat dog bird or cat bird since the string does contain one
 of my two terms.  So what do I need to do?  Thanks in advance!  

I'm not quite clear on what you want here.  Your example should NOT
have matched on cat dog bird since it contains one of your terms.
It would have matched on bird, since it doesn't.

If you want to match any string that doesn't include your terms, you
do it just like you said.

$value !~ /cat|dog/

If you want to match any string which does not exactly match your
terms, do this:

$value !~ /^(?:cat|dog)$/

This will match on anything other than cat or dog.

If this doesn't help, give us some more examples of things you expect
to match and things you don't expect to match.

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey [EMAIL PROTECTED] wrote:
D.J. wrote: OK, I'm stumped.I need to create a regex that will match if anything other than two terms I've specified exist. So for example, I have two terms I like, say cat and dog.I want
 the rule to match if a string contains anything other than cat or dog. I tried ... $value !~ /cat|dog/ ...but this had the unintended consequence of still matching a string
 like cat dog bird or cat bird since the string does contain one of my two terms.So what do I need to do?Thanks in advance!I'm not quite clear on what you want here.Your example should NOT
have matched on cat dog bird since it contains one of your terms.It would have matched on bird, since it doesn't.If you want to match any string that doesn't include your terms, you
do it just like you said.$value !~ /cat|dog/If you want to match any string which does not exactly match yourterms, do this:$value !~ /^(?:cat|dog)$/This will match on anything other than cat or dog.
If this doesn't help, give us some more examples of things you expectto match and things you don't expect to match.--BowieI'm expecting these type of strings for sure:
catdogcat dogdog catBut I may get something like this too:cat cat dogdog dogEssentially I want it to match if anything other than cat or dog is in the string.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey [EMAIL PROTECTED] wrote:
D.J. wrote: OK, I'm stumped.I need to create a regex that will match if anything other than two terms I've specified exist. So for example, I have two terms I like, say cat and dog.I want
 the rule to match if a string contains anything other than cat or dog. I tried ... $value !~ /cat|dog/ ...but this had the unintended consequence of still matching a string
 like cat dog bird or cat bird since the string does contain one of my two terms.So what do I need to do?Thanks in advance!I'm not quite clear on what you want here.Your example should NOT
have matched on cat dog bird since it contains one of your terms.It would have matched on bird, since it doesn't.If you want to match any string that doesn't include your terms, you
do it just like you said.$value !~ /cat|dog/If you want to match any string which does not exactly match yourterms, do this:$value !~ /^(?:cat|dog)$/This will match on anything other than cat or dog.
If this doesn't help, give us some more examples of things you expectto match and things you don't expect to match.--BowieThe regex...$value !~ /^(?:cat|dog)$/
...incorrectly matches a string such as cat dog or dog cat where both terms are present. It does however work properly for something like dog bird.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
I'm not quite clear on what you want here.Your example should NOThave matched on cat dog bird since it contains one of your terms.
It would have matched on bird, since it doesn't.Oops... that's what I meant. It doesn't match (though I want it to) because it contains one of the terms.


Re: Calling Regex Experts

2006-08-24 Thread Bart Schaefer

On 8/24/06, D. J. [EMAIL PROTECTED] wrote:


I'm expecting these type of strings for sure:

cat
dog
cat dog
dog cat

But I may get something like this too:

cat cat dog
dog dog

Essentially I want it to match if anything other than cat or dog is in the
string.


That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g.
/\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
context of a spamassassin rule, except maybe one matching against a
specific header.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bart Schaefer [EMAIL PROTECTED] wrote:
On 8/24/06, D. J. [EMAIL PROTECTED] wrote: I'm expecting these type of strings for sure: cat dog cat dog dog cat
 But I may get something like this too: cat cat dog dog dog Essentially I want it to match if anything other than cat or dog is in the string.That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g./\A(\s*(cat|dog)\s*)+\Z/.I'm not sure that ever makes sense in thecontext of a spamassassin rule, except maybe one matching against aspecific header.
That's the idea... I've got the RELAY_COUNTRIES plugin that I want it to place a small score if the relay server is not in the US or Canada. However, I'm not sure if the plugin will list the same country multiple times, which is where my uncertainty in the cat cat dog scenario came in. So far my original rule ( !~ /cat|dog/) seems to be working well, but if I have a spammer smart enough to manage to bounce his spam originating in China off of somewhere in the US before it hits my MX, then that rule will fail. Am I possibly too paranoid?



RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
 On 8/24/06, Bart Schaefer [EMAIL PROTECTED] wrote:
  On 8/24/06, D. J. [EMAIL PROTECTED] wrote:
   
   I'm expecting these type of strings for sure:
   
   cat
   dog
   cat dog
   dog cat
   
   But I may get something like this too:
   
   cat cat dog
   dog dog
   
   Essentially I want it to match if anything other than cat or dog is
   in the string.
  
  That constraint means you have to construct a regex that can be
  anchored at both beginning and end of string, e.g.
  /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
  context of a spamassassin rule, except maybe one matching against a
  specific header.
 
 That's the idea... I've got the RELAY_COUNTRIES plugin that I want it
 to place a small score if the relay server is not in the US or
 Canada.  However, I'm not sure if the plugin will list the same
 country multiple times, which is where my uncertainty in the cat cat
 dog scenario came in.  So far my original rule ( !~ /cat|dog/) seems
 to be working well, but if I have a spammer smart enough to manage to
 bounce his spam originating in China off of somewhere in the US
 before it hits my MX, then that rule will fail.  Am I possibly too
 paranoid?

Ok.  Try this one:

   $value =~ /\b(?!cat\b|dog\b)\w+\b/i

This will match any word in the string as long as that word is not
cat or dog.

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey [EMAIL PROTECTED] wrote:
D.J. wrote: On 8/24/06, Bart Schaefer [EMAIL PROTECTED] wrote:  On 8/24/06, D. J. [EMAIL PROTECTED]
 wrote: I'm expecting these type of strings for sure: cat   dog   cat dog   dog cat  
   But I may get something like this too: cat cat dog   dog dog Essentially I want it to match if anything other than cat or dog is
   in the string.   That constraint means you have to construct a regex that can be  anchored at both beginning and end of string, e.g.  /\A(\s*(cat|dog)\s*)+\Z/.I'm not sure that ever makes sense in the
  context of a spamassassin rule, except maybe one matching against a  specific header. That's the idea... I've got the RELAY_COUNTRIES plugin that I want it to place a small score if the relay server is not in the US or
 Canada.However, I'm not sure if the plugin will list the same country multiple times, which is where my uncertainty in the cat cat dog scenario came in.So far my original rule ( !~ /cat|dog/) seems
 to be working well, but if I have a spammer smart enough to manage to bounce his spam originating in China off of somewhere in the US before it hits my MX, then that rule will fail.Am I possibly too
 paranoid?Ok.Try this one: $value =~ /\b(?!cat\b|dog\b)\w+\b/iThis will match any word in the string as long as that word is notcat or dog.--Bowie
OK, we're actually really close. That actually matched everything I didn't want to match... we just have to get it to do the opposite of that. I have 6 test strings I tested against in a test script:
catdogcat dogdog catbirdcat birdIt matched the top four (incorrectly).


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
 On 8/24/06, Bowie Bailey [EMAIL PROTECTED] wrote:
  D.J. wrote:
   On 8/24/06, Bart Schaefer [EMAIL PROTECTED] wrote:
On 8/24/06, D. J. [EMAIL PROTECTED]  wrote:
 
 I'm expecting these type of strings for sure:
 
 cat
 dog
 cat dog
 dog cat
 
 But I may get something like this too:
 
 cat cat dog
 dog dog
 
 Essentially I want it to match if anything other than cat or
 dog is in the string.

That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g.
/\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in
the context of a spamassassin rule, except maybe one matching
against a specific header.
   
   That's the idea... I've got the RELAY_COUNTRIES plugin that I want
   it to place a small score if the relay server is not in the US or
   Canada.  However, I'm not sure if the plugin will list the same
   country multiple times, which is where my uncertainty in the cat
   cat dog scenario came in.  So far my original rule ( !~ /cat|dog/)
   seems to be working well, but if I have a spammer smart enough to
   manage to bounce his spam originating in China off of somewhere in
   the US before it hits my MX, then that rule will fail.  Am I
   possibly too paranoid?
  
  Ok.  Try this one:
  
 $value =~ /\b(?!cat\b|dog\b)\w+\b/i
  
  This will match any word in the string as long as that word is not
  cat or dog.
 
 OK, we're actually really close.  That actually matched everything I
 didn't want to match... we just have to get it to do the opposite of
 that.  I have 6 test strings I tested against in a test script:  
 
 cat
 dog
 cat dog
 dog cat
 bird
 cat bird
 
 It matched the top four (incorrectly).

Are you sure you used it correctly?  This is a positive match (=~), not a
negative match (!~).

Test program:
@strings = ( cat, dog, cat dog, dog cat, bird,
 cat bird, caterwaul );
for $str (@strings) {
if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {
print $str -- MATCHED\n;
}
else {
print $str -- no match\n;
}
}

Output:
cat -- no match
dog -- no match
cat dog -- no match
dog cat -- no match
bird -- MATCHED
cat bird -- MATCHED
caterwaul -- MATCHED

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey [EMAIL PROTECTED] wrote:
D.J. wrote: On 8/24/06, Bowie Bailey [EMAIL PROTECTED] wrote:  D.J. wrote:   On 8/24/06, Bart Schaefer 
[EMAIL PROTECTED] wrote:On 8/24/06, D. J. [EMAIL PROTECTED]  wrote: I'm expecting these type of strings for sure:
 cat dog cat dog dog cat But I may get something like this too:
 cat cat dog dog dog Essentially I want it to match if anything other than cat or
 dog is in the string.   That constraint means you have to construct a regex that can beanchored at both beginning and end of string, 
e.g./\A(\s*(cat|dog)\s*)+\Z/.I'm not sure that ever makes sense inthe context of a spamassassin rule, except maybe one matchingagainst a specific header.
 That's the idea... I've got the RELAY_COUNTRIES plugin that I want   it to place a small score if the relay server is not in the US or   Canada.However, I'm not sure if the plugin will list the same
   country multiple times, which is where my uncertainty in the cat   cat dog scenario came in.So far my original rule ( !~ /cat|dog/)   seems to be working well, but if I have a spammer smart enough to
   manage to bounce his spam originating in China off of somewhere in   the US before it hits my MX, then that rule will fail.Am I   possibly too paranoid? 
  Ok.Try this one:  $value =~ /\b(?!cat\b|dog\b)\w+\b/i   This will match any word in the string as long as that word is not  cat or dog.
 OK, we're actually really close.That actually matched everything I didn't want to match... we just have to get it to do the opposite of that.I have 6 test strings I tested against in a test script:
 cat dog cat dog dog cat bird cat bird It matched the top four (incorrectly).Are you sure you used it correctly?This is a positive match (=~), not a
negative match (!~).Test program:@strings = ( cat, dog, cat dog, dog cat, bird, cat bird, caterwaul );
for $str (@strings) {if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {print $str -- MATCHED\n;}else {print $str -- no match\n;
}}Output:cat -- no matchdog -- no matchcat dog -- no matchdog cat -- no matchbird -- MATCHEDcat bird -- MATCHEDcaterwaul -- MATCHED--
BowieBINGO! I still had my negative in there, I only copied the / to / part of the regex. You sir, are the man!


Re: Calling Regex Experts

2006-08-24 Thread jdow

From: D.J. [EMAIL PROTECTED]

I'm expecting these type of strings for sure:

cat
dog
cat dog
dog cat

But I may get something like this too:

cat cat dog
dog dog

Essentially I want it to match if anything other than cat or dog is in the
string.


And do what with cat cat dog catapult?

{^_^}