Re: Match First Sequence in Regular Expression?

Alex Martelli Thu, 26 Jan 2006 08:12:11 -0800

Tim Chase <[EMAIL PROTECTED]> wrote:

> > Sorry for the confusion.  The correct pattern should reject
> > all strings except those in which the first sequence of the
> > letter 'a' that is followed by the letter 'b' has a length of
> > exactly three.
> 
> Ah...a little more clear.
> 
>       r = re.compile("[^a]*a{3}b+(a+b*)*")
>       matches = [s for s in listOfStringsToTest if r.match(s)]


Unfortunately, the OP's spec is even more complex than this, if we are
to take to the letter what you just quoted; e.g.
  aazaaab
SHOULD match, because the sequence 'aaz' (being 'a' NOT followed by the
letter 'b') should not invalidate the match that follows.  I don't think
he means the strings contain only a's and b's.

Locating 'the first sequence of a followed by b' is easy, and reasonably
easy to check the sequence is exactly of length 3 (e.g. with a negative
lookbehind) -- but I don't know how to tell a RE to *stop* searching for
more if the check fails.

If a little more than just REs and matching was allowed, it would be
reasonably easy, but I don't know how to fashion a RE r such that
r.match(s) will succeed if and only if s meets those very precise and
complicated specs.  That doesn't mean it just can't be done, just that I
can't do it so far.  Perhaps the OP can tell us what constrains him to
use r.match ONLY, rather than a little bit of logic around it, so we can
see if we're trying to work in an artificially overconstrained domain?


Alex

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Match First Sequence in Regular Expression?

Reply via email to