Tim Chase <[EMAIL PROTECTED]> wrote: > > Sorry for the confusion. The correct pattern should reject > > all strings except those in which the first sequence of the > > letter 'a' that is followed by the letter 'b' has a length of > > exactly three. > > Ah...a little more clear. > > r = re.compile("[^a]*a{3}b+(a+b*)*") > matches = [s for s in listOfStringsToTest if r.match(s)]
Unfortunately, the OP's spec is even more complex than this, if we are to take to the letter what you just quoted; e.g. aazaaab SHOULD match, because the sequence 'aaz' (being 'a' NOT followed by the letter 'b') should not invalidate the match that follows. I don't think he means the strings contain only a's and b's. Locating 'the first sequence of a followed by b' is easy, and reasonably easy to check the sequence is exactly of length 3 (e.g. with a negative lookbehind) -- but I don't know how to tell a RE to *stop* searching for more if the check fails. If a little more than just REs and matching was allowed, it would be reasonably easy, but I don't know how to fashion a RE r such that r.match(s) will succeed if and only if s meets those very precise and complicated specs. That doesn't mean it just can't be done, just that I can't do it so far. Perhaps the OP can tell us what constrains him to use r.match ONLY, rather than a little bit of logic around it, so we can see if we're trying to work in an artificially overconstrained domain? Alex -- http://mail.python.org/mailman/listinfo/python-list