Re: Match First Sequence in Regular Expression?

Tim Chase Thu, 26 Jan 2006 11:26:41 -0800

>> "xyz123aaabbaaabab"
>> 
>> where you have "aaab" in there twice.
> 
> Good suggestion.


I assumed that this would be a valid case.  If not, the
expression would need tweaking.

>> ^([^b]|((?<!a)b))*aaab+[ab]*$
> 
> Looks good, although I've been unable to find a good
> explanation of the "negative lookbehind" construct "(?<".  How
> does it work?

The beginning part of the expression

        ([^b]|((?<!a)b))*

breaks down as

        [^b]        anything that isn't a "b"
        |           or
        (...)       this other thing

where "this other thing" is

        (?<!a)b     a "b" as long as it isn't immediately
                    preceeded by an "a"

The "(?<!...)" construct means that the "..." portion can't come 
before the following token in the regexp...in this case, before a 
"b".

There's also a "negative lookahead" (rather than "lookbehind") 
which prevents items from following.  This should be usable in 
this scenario as wall and works with the aforementioned tests, using

        "^([^a]|(a(?!b)))*aaab+[ab]*$"

which would be "anything that's not an 'a'; or an 'a' as long as 
it's not followed by a 'b'"

The gospel is at:
http://docs.python.org/lib/re-syntax.html

but is a bit terse.  O'reily has a fairly good book on regexps if 
you want to dig a bit deeper.

-tkc



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Match First Sequence in Regular Expression?

Reply via email to