Re: Calling Regex Experts

2006-08-24 Thread jdow

From: "D.J." <[EMAIL PROTECTED]>

I'm expecting these type of strings for sure:

cat
dog
cat dog
dog cat

But I may get something like this too:

cat cat dog
dog dog

Essentially I want it to match if anything other than cat or dog is in the
string.


And do what with "cat cat dog catapult"?

{^_^}


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:> > D.J. wrote:> > > On 8/24/06, Bart Schaefer <
[EMAIL PROTECTED]> wrote:> > > > On 8/24/06, D. J. <[EMAIL PROTECTED] > wrote:> > > > >> > > > > I'm expecting these type of strings for sure:
> > > > >> > > > > cat> > > > > dog> > > > > cat dog> > > > > dog cat> > > > >> > > > > But I may get something like this too:
> > > > >> > > > > cat cat dog> > > > > dog dog> > > > >> > > > > Essentially I want it to match if anything other than cat or
> > > > > dog is in the string.> > > >> > > > That constraint means you have to construct a regex that can be> > > > anchored at both beginning and end of string, 
e.g.> > > > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in> > > > the context of a spamassassin rule, except maybe one matching> > > > against a specific header.
> > >> > > That's the idea... I've got the RELAY_COUNTRIES plugin that I want> > > it to place a small score if the relay server is not in the US or> > > Canada.  However, I'm not sure if the plugin will list the same
> > > country multiple times, which is where my uncertainty in the "cat> > > cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/)> > > seems to be working well, but if I have a spammer smart enough to
> > > manage to bounce his spam originating in China off of somewhere in> > > the US before it hits my MX, then that rule will fail.  Am I> > > possibly too paranoid?> >
> > Ok.  Try this one:> >> >$value =~ /\b(?!cat\b|dog\b)\w+\b/i> >> > This will match any word in the string as long as that word is not> > "cat" or "dog".
>> OK, we're actually really close.  That actually matched everything I> didn't want to match... we just have to get it to do the opposite of> that.  I have 6 test strings I tested against in a test script:
>> cat> dog> cat dog> dog cat> bird> cat bird>> It matched the top four (incorrectly).Are you sure you used it correctly?  This is a positive match (=~), not a
negative match (!~).Test program:@strings = ( "cat", "dog", "cat dog", "dog cat", "bird", "cat bird", "caterwaul" );
for $str (@strings) {if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {print "$str -- MATCHED\n";}else {print "$str -- no match\n";
}}Output:cat -- no matchdog -- no matchcat dog -- no matchdog cat -- no matchbird -- MATCHEDcat bird -- MATCHEDcaterwaul -- MATCHED--
BowieBINGO!  I still had my negative in there, I only copied the / to / part of the regex.  You sir, are the man!


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
> On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
> > D.J. wrote:
> > > On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:
> > > > On 8/24/06, D. J. <[EMAIL PROTECTED] > wrote:
> > > > > 
> > > > > I'm expecting these type of strings for sure:
> > > > > 
> > > > > cat
> > > > > dog
> > > > > cat dog
> > > > > dog cat
> > > > > 
> > > > > But I may get something like this too:
> > > > > 
> > > > > cat cat dog
> > > > > dog dog
> > > > > 
> > > > > Essentially I want it to match if anything other than cat or
> > > > > dog is in the string.
> > > > 
> > > > That constraint means you have to construct a regex that can be
> > > > anchored at both beginning and end of string, e.g.
> > > > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in
> > > > the context of a spamassassin rule, except maybe one matching
> > > > against a specific header.
> > > 
> > > That's the idea... I've got the RELAY_COUNTRIES plugin that I want
> > > it to place a small score if the relay server is not in the US or
> > > Canada.  However, I'm not sure if the plugin will list the same
> > > country multiple times, which is where my uncertainty in the "cat
> > > cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/)
> > > seems to be working well, but if I have a spammer smart enough to
> > > manage to bounce his spam originating in China off of somewhere in
> > > the US before it hits my MX, then that rule will fail.  Am I
> > > possibly too paranoid?
> > 
> > Ok.  Try this one:
> > 
> >$value =~ /\b(?!cat\b|dog\b)\w+\b/i
> > 
> > This will match any word in the string as long as that word is not
> > "cat" or "dog".
> 
> OK, we're actually really close.  That actually matched everything I
> didn't want to match... we just have to get it to do the opposite of
> that.  I have 6 test strings I tested against in a test script:  
> 
> cat
> dog
> cat dog
> dog cat
> bird
> cat bird
> 
> It matched the top four (incorrectly).

Are you sure you used it correctly?  This is a positive match (=~), not a
negative match (!~).

Test program:
@strings = ( "cat", "dog", "cat dog", "dog cat", "bird",
 "cat bird", "caterwaul" );
for $str (@strings) {
if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {
print "$str -- MATCHED\n";
}
else {
print "$str -- no match\n";
}
}

Output:
cat -- no match
dog -- no match
cat dog -- no match
dog cat -- no match
bird -- MATCHED
cat bird -- MATCHED
caterwaul -- MATCHED

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:> > On 8/24/06, D. J. <[EMAIL PROTECTED]
> wrote:> > >> > > I'm expecting these type of strings for sure:> > >> > > cat> > > dog> > > cat dog> > > dog cat> > >
> > > But I may get something like this too:> > >> > > cat cat dog> > > dog dog> > >> > > Essentially I want it to match if anything other than cat or dog is
> > > in the string.> >> > That constraint means you have to construct a regex that can be> > anchored at both beginning and end of string, e.g.> > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
> > context of a spamassassin rule, except maybe one matching against a> > specific header.>> That's the idea... I've got the RELAY_COUNTRIES plugin that I want it> to place a small score if the relay server is not in the US or
> Canada.  However, I'm not sure if the plugin will list the same> country multiple times, which is where my uncertainty in the "cat cat> dog" scenario came in.  So far my original rule ( !~ /cat|dog/) seems
> to be working well, but if I have a spammer smart enough to manage to> bounce his spam originating in China off of somewhere in the US> before it hits my MX, then that rule will fail.  Am I possibly too
> paranoid?Ok.  Try this one:   $value =~ /\b(?!cat\b|dog\b)\w+\b/iThis will match any word in the string as long as that word is not"cat" or "dog".--Bowie
OK, we're actually really close.  That actually matched everything I didn't want to match... we just have to get it to do the opposite of that.  I have 6 test strings I tested against in a test script:
catdogcat dogdog catbirdcat birdIt matched the top four (incorrectly).


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
> On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:
> > On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:
> > > 
> > > I'm expecting these type of strings for sure:
> > > 
> > > cat
> > > dog
> > > cat dog
> > > dog cat
> > > 
> > > But I may get something like this too:
> > > 
> > > cat cat dog
> > > dog dog
> > > 
> > > Essentially I want it to match if anything other than cat or dog is
> > > in the string.
> > 
> > That constraint means you have to construct a regex that can be
> > anchored at both beginning and end of string, e.g.
> > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
> > context of a spamassassin rule, except maybe one matching against a
> > specific header.
> 
> That's the idea... I've got the RELAY_COUNTRIES plugin that I want it
> to place a small score if the relay server is not in the US or
> Canada.  However, I'm not sure if the plugin will list the same
> country multiple times, which is where my uncertainty in the "cat cat
> dog" scenario came in.  So far my original rule ( !~ /cat|dog/) seems
> to be working well, but if I have a spammer smart enough to manage to
> bounce his spam originating in China off of somewhere in the US
> before it hits my MX, then that rule will fail.  Am I possibly too
> paranoid?

Ok.  Try this one:

   $value =~ /\b(?!cat\b|dog\b)\w+\b/i

This will match any word in the string as long as that word is not
"cat" or "dog".

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:
On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:>> I'm expecting these type of strings for sure:>> cat> dog> cat dog> dog cat
>> But I may get something like this too:>> cat cat dog> dog dog>> Essentially I want it to match if anything other than cat or dog is in the> string.That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g./\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in thecontext of a spamassassin rule, except maybe one matching against aspecific header.
That's the idea... I've got the RELAY_COUNTRIES plugin that I want it to place a small score if the relay server is not in the US or Canada.  However, I'm not sure if the plugin will list the same country multiple times, which is where my uncertainty in the "cat cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/) seems to be working well, but if I have a spammer smart enough to manage to bounce his spam originating in China off of somewhere in the US before it hits my MX, then that rule will fail.  Am I possibly too paranoid?



Re: Calling Regex Experts

2006-08-24 Thread Bart Schaefer

On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:


I'm expecting these type of strings for sure:

cat
dog
cat dog
dog cat

But I may get something like this too:

cat cat dog
dog dog

Essentially I want it to match if anything other than cat or dog is in the
string.


That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g.
/\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
context of a spamassassin rule, except maybe one matching against a
specific header.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
I'm not quite clear on what you want here.  Your example should NOThave matched on "cat dog bird" since it contains one of your terms.
It would have matched on "bird", since it doesn't.Oops... that's what I meant.  It doesn't match (though I want it to) because it contains one of the terms.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> OK, I'm stumped.  I need to create a regex that will match if> anything other than two terms I've specified exist.>> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or> dog.>> I tried ...>> $value !~ /cat|dog/>> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one> of my two terms.  So what do I need to do?  Thanks in advance!I'm not quite clear on what you want here.  Your example should NOT
have matched on "cat dog bird" since it contains one of your terms.It would have matched on "bird", since it doesn't.If you want to match any string that doesn't include your terms, you
do it just like you said.$value !~ /cat|dog/If you want to match any string which does not exactly match yourterms, do this:$value !~ /^(?:cat|dog)$/This will match on anything other than "cat" or "dog".
If this doesn't help, give us some more examples of things you expectto match and things you don't expect to match.--BowieThe regex...$value !~ /^(?:cat|dog)$/
...incorrectly matches a string such as "cat dog" or "dog cat" where both terms are present.  It does however work properly for something like "dog bird".


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> OK, I'm stumped.  I need to create a regex that will match if> anything other than two terms I've specified exist.>> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or> dog.>> I tried ...>> $value !~ /cat|dog/>> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one> of my two terms.  So what do I need to do?  Thanks in advance!I'm not quite clear on what you want here.  Your example should NOT
have matched on "cat dog bird" since it contains one of your terms.It would have matched on "bird", since it doesn't.If you want to match any string that doesn't include your terms, you
do it just like you said.$value !~ /cat|dog/If you want to match any string which does not exactly match yourterms, do this:$value !~ /^(?:cat|dog)$/This will match on anything other than "cat" or "dog".
If this doesn't help, give us some more examples of things you expectto match and things you don't expect to match.--BowieI'm expecting these type of strings for sure:
catdogcat dogdog catBut I may get something like this too:cat cat dogdog dogEssentially I want it to match if anything other than cat or dog is in the string.


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
> OK, I'm stumped.  I need to create a regex that will match if
> anything other than two terms I've specified exist. 
> 
> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or
> dog.  
> 
> I tried ...
> 
> $value !~ /cat|dog/
> 
> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one
> of my two terms.  So what do I need to do?  Thanks in advance!  

I'm not quite clear on what you want here.  Your example should NOT
have matched on "cat dog bird" since it contains one of your terms.
It would have matched on "bird", since it doesn't.

If you want to match any string that doesn't include your terms, you
do it just like you said.

$value !~ /cat|dog/

If you want to match any string which does not exactly match your
terms, do this:

$value !~ /^(?:cat|dog)$/

This will match on anything other than "cat" or "dog".

If this doesn't help, give us some more examples of things you expect
to match and things you don't expect to match.

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



D.J. wrote:
> OK, I'm stumped.  I need to create a regex that will match if anything
> other than two terms I've specified exist.
> 
> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or dog.
> 
> I tried ...
> 
> $value !~ /cat|dog/
> 
> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one of
> my two terms.  So what do I need to do?  Thanks in advance!
> 
> - D.J.


D.J.,

 you're probably best off using META rules for this.  So you could have
something like (completely untested and off the top of my head in the
middle of the night):

body __CAT  /cat/
body __DOG  /dog/

meta NOT_CAT_AND_DOG(!__CAT && !__DOG)

you should definitely check the man pages and/or wiki about writing
rules to do this properly, but that should get you started.

Alan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE7dfoE2gsBSKjZHQRAozpAKC+edJGc52qWz1qguOQReCLUy3z9ACgzFpn
V20guvwnlLaKHy3Aiy8FLQs=
=eGwC
-END PGP SIGNATURE-