Re: Match First Sequence in Regular Expression?
Alex Martelli wrote: > Christoph Conrad <[EMAIL PROTECTED]> wrote: > > >>Hello Roger, >> >> >>>since the length of the first sequence of the letter 'a' is 2. Yours >>>accepts it, right? >> >>Yes, i misunderstood your requirements. So it must be modified >>essentially to that what Tim Chase wrote: >> >>m = re.search('^[^a]*a{3}b', 'xyz123aabbaaab') > > > ...but that rejects 'aazaaab' which should apparently be accepted. ... and that is OK. That was the request: >I'm looking for a regular expression that matches the first, and only > the first, sequence of the letter 'a', and only if the length of the > sequence is exactly 3. --Armin > > > Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
How about: pattern = re.compile('^([^a]|(a+[^ab]))*aaab') Which basically says, "precede with arbitrarily many non-a's or a sequences ending in non-b, then must have 3 as followed by a b." cases = ["xyz123aaabbab", "xayz123aaabab", "xaaayz123aaabab", "xyz123babaaabab", "xyz123aabbaaab", "xaaayz123abab"] [re.search(pattern, case) is not None for case in cases] [True, True, True, False, False, False] --Scott David Daniels [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Christoph Conrad <[EMAIL PROTECTED]> wrote: > Hallo Alex, > > >> r = re.compile("[^a]*a{3}b+(a+b*)*") matches = [s for s in > >> listOfStringsToTest if r.match(s)] > > > Unfortunately, the OP's spec is even more complex than this, if we are > > to take to the letter what you just quoted; e.g. aazaaab SHOULD match, > > Then it's again "a{3}b", isn't it? Except that this one would also match aazab, which it shouldn't. Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
>> "xyz123aaabbaaabab" >> >> where you have "aaab" in there twice. > > Good suggestion. I assumed that this would be a valid case. If not, the expression would need tweaking. >> ^([^b]|((? > Looks good, although I've been unable to find a good > explanation of the "negative lookbehind" construct "(?<". How > does it work? The beginning part of the expression ([^b]|((?http://docs.python.org/lib/re-syntax.html but is a bit terse. O'reily has a fairly good book on regexps if you want to dig a bit deeper. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Tim Chase" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > The below seems to pass all the tests you threw at it (taking the modified > 2nd test into consideration) > > One other test that occurs to me would be > > "xyz123aaabbaaabab" > > where you have "aaab" in there twice. Good suggestion. > ^([^b]|((?http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
The below seems to pass all the tests you threw at it (taking the modified 2nd test into consideration) One other test that occurs to me would be "xyz123aaabbaaabab" where you have "aaab" in there twice. -tkc import re tests = [ ("xyz123aaabbab",True), ("xyz123aabbaaab", False), ("xayz123aaabab",True), ("xaaayz123abab", False), ("xaaayz123aaabab",True) ] exp = '^([^b]|((?http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Fredrik Lundh" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Roger L. Cauvin wrote: > >> > $ python test.py >> > gotexpected >> > --- >> > accept accept >> > reject reject >> > accept accept >> > reject reject >> > accept accept >> >> Thanks, but the second test case I listed contained a typo. It should >> have >> contained a sequence of three of the letter 'a'. The test cases should >> be: >> >> "xyz123aaabbab" accept >> "xyz123aabbaaab" reject >> "xayz123aaabab" accept >> "xaaayz123abab" reject >> "xaaayz123aaabab" accept >> >> Your pattern fails the second test. > > $ more test.py > > import re > > print "gotexpected" > print "-- " > > testsuite = ( >("xyz123aaabbab", "accept"), >("xyz123aabbaaab", "reject"), >("xayz123aaabab", "accept"), >("xaaayz123abab", "reject"), >("xaaayz123aaabab", "accept"), >) > > for string, result in testsuite: >m = re.search("a+b", string) >if m and len(m.group()) == 4: >print "accept", >else: >print "reject", >print result > > $ python test.py > > gotexpected > -- > accept accept > reject reject > accept accept > reject reject > accept accept Thanks, but I'm looking for a solution in terms of a regular expression only. In other words, "accept" means the regular expression matched, and "reject" means the regular expression did not match. I want to see if I can fulfill the requirements without additional code (such as checking "len(m.group())"). -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Thu, 26 Jan 2006 17:09:18 GMT, rumours say that "Roger L. Cauvin" > <[EMAIL PROTECTED]> might have written: > >>Thanks, but the second test case I listed contained a typo. It should >>have >>contained a sequence of three of the letter 'a'. The test cases should >>be: >> >>"xyz123aaabbab" accept >>"xyz123aabbaaab" reject > > Here I object to either you or your need for a regular expression. You > see, > before the "aaa" in your second test case, you have an "arbitrary sequence > of characters", so your requirements are met. Well, thank you for your efforts so far, Christos. My purpose is to determine whether it's possible to do this using regular expressions, since my application is already architected around configuration files that use regular expressions. It may not be the best architecture, but I still don't know the answer to my question. Is it *possible* to fulfill my requirements with regular expressions, even if it's not the best way to do it? The requirements are not met by your regular expression, since by definition the "arbitrary sequence of characters" stops once the sequences of a's and b's starts. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Roger L. Cauvin wrote: > > $ python test.py > > gotexpected > > --- > > accept accept > > reject reject > > accept accept > > reject reject > > accept accept > > Thanks, but the second test case I listed contained a typo. It should have > contained a sequence of three of the letter 'a'. The test cases should be: > > "xyz123aaabbab" accept > "xyz123aabbaaab" reject > "xayz123aaabab" accept > "xaaayz123abab" reject > "xaaayz123aaabab" accept > > Your pattern fails the second test. $ more test.py import re print "gotexpected" print "-- " testsuite = ( ("xyz123aaabbab", "accept"), ("xyz123aabbaaab", "reject"), ("xayz123aaabab", "accept"), ("xaaayz123abab", "reject"), ("xaaayz123aaabab", "accept"), ) for string, result in testsuite: m = re.search("a+b", string) if m and len(m.group()) == 4: print "accept", else: print "reject", print result $ python test.py gotexpected -- accept accept reject reject accept accept reject reject accept accept -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Thu, 26 Jan 2006 18:01:07 +0100, rumours say that "Fredrik Lundh" > <[EMAIL PROTECTED]> might have written: > >>Roger L. Cauvin wrote: >> >>> Good suggestion. Here are some "test cases": >>> >>> "xyz123aaabbab" accept >>> "xyz123aabbaab" reject >>> "xayz123aaabab" accept >>> "xaaayz123abab" reject >>> "xaaayz123aaabab" accept >> >>$ more test.py > > [snip of code] >>m = re.search("aaab", string) > [snip of more code] > >>$ python test.py >>gotexpected >>--- >>accept accept >>reject reject >>accept accept >>reject reject >>accept accept > > You're right, Fredrik, but we (graciously as a group :) take also notice > of > the other requirements that the OP has provided elsewhere and that are not > covered by the simple test that he specified. My fault, guys. The second test case should be "xyz123aabbaaab" reject instead of "xyz123aabbaab" reject Fredrik's pattern fails this test case. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
On Thu, 26 Jan 2006 17:09:18 GMT, rumours say that "Roger L. Cauvin" <[EMAIL PROTECTED]> might have written: >Thanks, but the second test case I listed contained a typo. It should have >contained a sequence of three of the letter 'a'. The test cases should be: > >"xyz123aaabbab" accept >"xyz123aabbaaab" reject Here I object to either you or your need for a regular expression. You see, before the "aaa" in your second test case, you have an "arbitrary sequence of characters", so your requirements are met. -- TZOTZIOY, I speak England very best. "Dear Paul, please stop spamming us." The Corinthians -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Thu, 26 Jan 2006 16:26:57 GMT, rumours say that "Roger L. Cauvin" > <[EMAIL PROTECTED]> might have written: > >>"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message >>news:[EMAIL PROTECTED] > >>> On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin" >>> <[EMAIL PROTECTED]> might have written: > Say I have some string that begins with an arbitrary sequence of characters and then alternates repeating the letters 'a' and 'b' any number of times, e.g. "xyz123aaabbaaabaaaabb" I'm looking for a regular expression that matches the first, and only the first, sequence of the letter 'a', and only if the length of the sequence is exactly 3. Does such a regular expression exist? If so, any ideas as to what it could be? >>> >>> Is this what you mean? >>> >>> ^[^a]*(a{3})(?:[^a].*)?$ >> >>Close, but the pattern should allow "arbitrary sequence of characters" >>that >>precede the alternating a's and b's to contain the letter 'a'. In other >>words, the pattern should accept: >> >>"xayz123aaabbab" >> >>since the 'a' between the 'x' and 'y' is not directly followed by a 'b'. >> >>Your proposed pattern rejects this string. > > 1. > > (a{3})(?:b[ab]*)?$ > > This finds the first (leftmost) "aaa" either at the end of the string or > followed by 'b' and then arbitrary sequences of 'a' and 'b'. > > This will also match "" (from second position on). > > 2. > > If you insist in only three 'a's and you can add the constraint that: > > * let s be the "arbitrary sequence of characters" at the start of your > searched text > * len(s) >= 1 and not s.endswith('a') > > then you'll have this reg.ex. > > (?<=[^a])(a{3})(?:b[ab]*)?$ > > 3. > > If you want to allow for a possible empty "arbitrary sequence of > characters" > at the start and you don't mind search speed > > ^(?:.?*[^a])?(a{3})(?:b[ab]*)?$ > > This should cover you: > s="xayzbaaa123aaabbab" r=re.compile(r"^(?:.*?[^a])?(a{3})(?:b[ab]*)?$") m= r.match(s) m.group(1) > 'aaa' m.start(1) > 11 s[11:] > 'aaabbab' Thanks for continuing to follow up, Christos. Please see my reply to your other post (in which you applied the test cases). -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Thu, 26 Jan 2006 16:41:08 GMT, rumours say that "Roger L. Cauvin" > <[EMAIL PROTECTED]> might have written: > >>Good suggestion. Here are some "test cases": >> >>"xyz123aaabbab" accept >>"xyz123aabbaab" reject >>"xayz123aaabab" accept >>"xaaayz123abab" reject >>"xaaayz123aaabab" accept > > Applying my last regex to your test cases: > r.match("xyz123aaabbab") > <_sre.SRE_Match object at 0x00B47F60> r.match("xyz123aabbaab") r.match("xayz123aaabab") > <_sre.SRE_Match object at 0x00B50020> r.match("xaaayz123abab") r.match("xaaayz123aaabab") > <_sre.SRE_Match object at 0x00B47F60> print r.pattern > ^(?:.*?[^a])?(a{3})(?:b[ab]*)?$ > > You should also remember to check the (match_object).start(1) to verify > that > it matches the "aaa" you want. Thanks, but the second test case I listed contained a typo. It should have contained a sequence of three of the letter 'a'. The test cases should be: "xyz123aaabbab" accept "xyz123aabbaaab" reject "xayz123aaabab" accept "xaaayz123abab" reject "xaaayz123aaabab" accept Your pattern fails the second test. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
On Thu, 26 Jan 2006 18:01:07 +0100, rumours say that "Fredrik Lundh" <[EMAIL PROTECTED]> might have written: >Roger L. Cauvin wrote: > >> Good suggestion. Here are some "test cases": >> >> "xyz123aaabbab" accept >> "xyz123aabbaab" reject >> "xayz123aaabab" accept >> "xaaayz123abab" reject >> "xaaayz123aaabab" accept > >$ more test.py [snip of code] >m = re.search("aaab", string) [snip of more code] >$ python test.py >gotexpected >--- >accept accept >reject reject >accept accept >reject reject >accept accept You're right, Fredrik, but we (graciously as a group :) take also notice of the other requirements that the OP has provided elsewhere and that are not covered by the simple test that he specified. The code above works for "b" too, which the OP has already ruled out, and it doesn't work for "aaa". -- TZOTZIOY, I speak England very best. "Dear Paul, please stop spamming us." The Corinthians -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Fredrik Lundh" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Roger L. Cauvin wrote: > >> Good suggestion. Here are some "test cases": >> >> "xyz123aaabbab" accept >> "xyz123aabbaab" reject >> "xayz123aaabab" accept >> "xaaayz123abab" reject >> "xaaayz123aaabab" accept > > $ more test.py > > import re > > print "gotexpected" > print "-- " > > testsuite = ( >("xyz123aaabbab", "accept"), >("xyz123aabbaab", "reject"), >("xayz123aaabab", "accept"), >("xaaayz123abab", "reject"), >("xaaayz123aaabab", "accept"), >) > > for string, result in testsuite: >m = re.search("aaab", string) >if m: >print "accept", >else: >print "reject", >print result > > > $ python test.py > gotexpected > --- > accept accept > reject reject > accept accept > reject reject > accept accept Thanks, but the second test case I listed contained a typo. It should have contained a sequence of three of the letter 'a'. The test cases should be: "xyz123aaabbab" accept "xyz123aabbaaab" reject "xayz123aaabab" accept "xaaayz123abab" reject "xaaayz123aaabab" accept Your pattern fails the second test. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
On Thu, 26 Jan 2006 16:26:57 GMT in comp.lang.python, "Roger L. Cauvin" <[EMAIL PROTECTED]> wrote: >"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] [...] >> Is this what you mean? >> >> ^[^a]*(a{3})(?:[^a].*)?$ > >Close, but the pattern should allow "arbitrary sequence of characters" that >precede the alternating a's and b's to contain the letter 'a'. In other >words, the pattern should accept: > >"xayz123aaabbab" > >since the 'a' between the 'x' and 'y' is not directly followed by a 'b'. > I don't know an RE is the best solution to this problem. If I understand the problem correctly, building a state machine to solve this is trivial. The following took about 5 minutes of coding: ---begin included file # Define our states. # state[0] is next state if character is 'a' # state[1] is next state if character is 'b' # state[2] is next state for any other character # Accept state means we've found a match Accept = [] for i in range(3): Accept.append(Accept) # Reject state means the string cannot match Reject = [] for i in range(3): Reject.append(Reject) # Remaining states: Start, 'a' found, 'aa', 'aaa', and '' Start = [0,1,2] a1 = [0,1,2] a2 = [0,1,2] a3 = [0,1,2] a4 = [0,1,2] # Start: looking for first 'a' Start[0] = a1 Start[1] = Start Start[2] = Start # a1: 1 'a' found so far a1[0] = a2 a1[1] = Reject a1[2] = Start # a2: 'aa' found a2[0] = a3 a2[1] = Reject a2[2] = Start # a3: 'aaa' found a3[0] = a4 a3[1] = Accept a3[2] = Start # a4: four or more 'a' in a row a4[0] = a4 a4[1] = Reject a4[2] = Start def detect(s): """ Return 1 if first substring aa*b has exactly 3 a's Return 0 otherwise """ state = Start for c in s: if c == 'a': state = state[0] elif c == 'b': state = state[1] else: state = state[2] if state is Accept: return 1 return 0 print detect("xyza123abc") print detect("xyzaaa123aabc") print detect("xyzaa123aaabc") print detect("xyza123bc") --- end included file --- And I'm pretty sure it does what you need, though it's pretty naive. Note that if '3' isn't a magic number, states a1, a2, a3, and a4 could be re-implemented as a single state with a counter, but the logic inside detect gets a little hairier. I haven't timed it, but it's not doing anything other than simple comparisons and assignments. It's a little (OK, a lot) more code than a simple RE, but I know it works. HTH, -=Dave -- Change is inevitable, progress is not. -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Roger L. Cauvin wrote: > Good suggestion. Here are some "test cases": > > "xyz123aaabbab" accept > "xyz123aabbaab" reject > "xayz123aaabab" accept > "xaaayz123abab" reject > "xaaayz123aaabab" accept $ more test.py import re print "gotexpected" print "-- " testsuite = ( ("xyz123aaabbab", "accept"), ("xyz123aabbaab", "reject"), ("xayz123aaabab", "accept"), ("xaaayz123abab", "reject"), ("xaaayz123aaabab", "accept"), ) for string, result in testsuite: m = re.search("aaab", string) if m: print "accept", else: print "reject", print result $ python test.py gotexpected --- accept accept reject reject accept accept reject reject accept accept -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
On Thu, 26 Jan 2006 16:41:08 GMT, rumours say that "Roger L. Cauvin" <[EMAIL PROTECTED]> might have written: >Good suggestion. Here are some "test cases": > >"xyz123aaabbab" accept >"xyz123aabbaab" reject >"xayz123aaabab" accept >"xaaayz123abab" reject >"xaaayz123aaabab" accept Applying my last regex to your test cases: >>> r.match("xyz123aaabbab") <_sre.SRE_Match object at 0x00B47F60> >>> r.match("xyz123aabbaab") >>> r.match("xayz123aaabab") <_sre.SRE_Match object at 0x00B50020> >>> r.match("xaaayz123abab") >>> r.match("xaaayz123aaabab") <_sre.SRE_Match object at 0x00B47F60> >>> print r.pattern ^(?:.*?[^a])?(a{3})(?:b[ab]*)?$ You should also remember to check the (match_object).start(1) to verify that it matches the "aaa" you want. -- TZOTZIOY, I speak England very best. "Dear Paul, please stop spamming us." The Corinthians -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Alex Martelli" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Tim Chase <[EMAIL PROTECTED]> wrote: > >> > Sorry for the confusion. The correct pattern should reject >> > all strings except those in which the first sequence of the >> > letter 'a' that is followed by the letter 'b' has a length of >> > exactly three. ... ... > If a little more than just REs and matching was allowed, it would be > reasonably easy, but I don't know how to fashion a RE r such that > r.match(s) will succeed if and only if s meets those very precise and > complicated specs. That doesn't mean it just can't be done, just that I > can't do it so far. Perhaps the OP can tell us what constrains him to > use r.match ONLY, rather than a little bit of logic around it, so we can > see if we're trying to work in an artificially overconstrained domain? Alex, you seem to grasp exactly what the requirements are in this case. I of course don't *have* to use regular expressions only, but I'm working with an infrastructure that uses regexps in configuration files so that the code doesn't have to change to add or change patterns. Before throwing up my hands and re-architecting, I wanted to see if regexps would handle the job (they have in every case but one). -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
On Thu, 26 Jan 2006 16:26:57 GMT, rumours say that "Roger L. Cauvin" <[EMAIL PROTECTED]> might have written: >"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] >> On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin" >> <[EMAIL PROTECTED]> might have written: >>>Say I have some string that begins with an arbitrary sequence of >>>characters >>>and then alternates repeating the letters 'a' and 'b' any number of times, >>>e.g. >>> >>>"xyz123aaabbaaabaaaabb" >>> >>>I'm looking for a regular expression that matches the first, and only the >>>first, sequence of the letter 'a', and only if the length of the sequence >>>is >>>exactly 3. >>> >>>Does such a regular expression exist? If so, any ideas as to what it >>>could >>>be? >> >> Is this what you mean? >> >> ^[^a]*(a{3})(?:[^a].*)?$ > >Close, but the pattern should allow "arbitrary sequence of characters" that >precede the alternating a's and b's to contain the letter 'a'. In other >words, the pattern should accept: > >"xayz123aaabbab" > >since the 'a' between the 'x' and 'y' is not directly followed by a 'b'. > >Your proposed pattern rejects this string. 1. (a{3})(?:b[ab]*)?$ This finds the first (leftmost) "aaa" either at the end of the string or followed by 'b' and then arbitrary sequences of 'a' and 'b'. This will also match "" (from second position on). 2. If you insist in only three 'a's and you can add the constraint that: * let s be the "arbitrary sequence of characters" at the start of your searched text * len(s) >= 1 and not s.endswith('a') then you'll have this reg.ex. (?<=[^a])(a{3})(?:b[ab]*)?$ 3. If you want to allow for a possible empty "arbitrary sequence of characters" at the start and you don't mind search speed ^(?:.?*[^a])?(a{3})(?:b[ab]*)?$ This should cover you: >>> s="xayzbaaa123aaabbab" >>> r=re.compile(r"^(?:.*?[^a])?(a{3})(?:b[ab]*)?$") >>> m= r.match(s) >>> m.group(1) 'aaa' >>> m.start(1) 11 >>> s[11:] 'aaabbab' -- TZOTZIOY, I speak England very best. "Dear Paul, please stop spamming us." The Corinthians -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Christoph Conrad <[EMAIL PROTECTED]> wrote: > Hello Roger, > > > since the length of the first sequence of the letter 'a' is 2. Yours > > accepts it, right? > > Yes, i misunderstood your requirements. So it must be modified > essentially to that what Tim Chase wrote: > > m = re.search('^[^a]*a{3}b', 'xyz123aabbaaab') ...but that rejects 'aazaaab' which should apparently be accepted. Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Peter Hansen" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Roger L. Cauvin wrote: >> Sorry for the confusion. The correct pattern should reject all strings >> except those in which the first sequence of the letter 'a' that is >> followed by the letter 'b' has a length of exactly three. >> >> Hope that's clearer . . . . > > Examples are a *really* good way to clarify ambiguous or complex > requirements. In fact, when made executable they're called "test cases" > :-), and supplying a few of those (showing input values and expected > output values) would help, not only to clarify your goals for the humans, > but also to let the proposed solutions easily be tested. Good suggestion. Here are some "test cases": "xyz123aaabbab" accept "xyz123aabbaab" reject "xayz123aaabab" accept "xaaayz123abab" reject "xaaayz123aaabab" accept -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Tim Chase" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >>>r = re.compile("[^a]*a{3}b+(a+b*)*") >>>matches = [s for s in listOfStringsToTest if r.match(s)] >> >> Wow, I like it, but it allows some strings it shouldn't. For example: >> >> "xyz123aabbaaab" >> >> (It skips over the two-letter sequence of 'a' and matches 'bbaaab'.) > > Anchoring it to the beginning/end might solve that: > > r = re.compile("^[^a]*a{3}b+(a+b*)*$") > > this ensures that no "a"s come before the first 3x"a" and nothing but "b" > and "a" follows it. Anchoring may be the key here, but this pattern rejects "xayz123aaabab" which it should accept, since the 'a' between the 'x' and the 'y' is not directly followed by the letter 'b'. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Roger L. Cauvin wrote: > Sorry for the confusion. The correct pattern should reject all strings > except those in which the first sequence of the letter 'a' that is followed > by the letter 'b' has a length of exactly three. > > Hope that's clearer . . . . Examples are a *really* good way to clarify ambiguous or complex requirements. In fact, when made executable they're called "test cases" :-), and supplying a few of those (showing input values and expected output values) would help, not only to clarify your goals for the humans, but also to let the proposed solutions easily be tested. (After all, are you going to just trust that whatever you are handed here is correctly implemented, and based on a perfect understanding of your apparently unclear requirements?) -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
>>r = re.compile("[^a]*a{3}b+(a+b*)*") >>matches = [s for s in listOfStringsToTest if r.match(s)] > > Wow, I like it, but it allows some strings it shouldn't. For example: > > "xyz123aabbaaab" > > (It skips over the two-letter sequence of 'a' and matches 'bbaaab'.) Anchoring it to the beginning/end might solve that: r = re.compile("^[^a]*a{3}b+(a+b*)*$") this ensures that no "a"s come before the first 3x"a" and nothing but "b" and "a" follows it. -tkc (who's translating from vim regexps which are just diff. enough to throw a wrench in works...) -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin" > <[EMAIL PROTECTED]> might have written: > >>Say I have some string that begins with an arbitrary sequence of >>characters >>and then alternates repeating the letters 'a' and 'b' any number of times, >>e.g. >> >>"xyz123aaabbaaabaaaabb" >> >>I'm looking for a regular expression that matches the first, and only the >>first, sequence of the letter 'a', and only if the length of the sequence >>is >>exactly 3. >> >>Does such a regular expression exist? If so, any ideas as to what it >>could >>be? > > Is this what you mean? > > ^[^a]*(a{3})(?:[^a].*)?$ Close, but the pattern should allow "arbitrary sequence of characters" that precede the alternating a's and b's to contain the letter 'a'. In other words, the pattern should accept: "xayz123aaabbab" since the 'a' between the 'x' and 'y' is not directly followed by a 'b'. Your proposed pattern rejects this string. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Tim Chase" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >> Sorry for the confusion. The correct pattern should reject >> all strings except those in which the first sequence of the >> letter 'a' that is followed by the letter 'b' has a length of >> exactly three. > > Ah...a little more clear. > > r = re.compile("[^a]*a{3}b+(a+b*)*") > matches = [s for s in listOfStringsToTest if r.match(s)] Wow, I like it, but it allows some strings it shouldn't. For example: "xyz123aabbaaab" (It skips over the two-letter sequence of 'a' and matches 'bbaaab'.) -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Hallo Alex, >> r = re.compile("[^a]*a{3}b+(a+b*)*") matches = [s for s in >> listOfStringsToTest if r.match(s)] > Unfortunately, the OP's spec is even more complex than this, if we are > to take to the letter what you just quoted; e.g. aazaaab SHOULD match, Then it's again "a{3}b", isn't it? Freundliche Grüße, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Tim Chase <[EMAIL PROTECTED]> wrote: > > Sorry for the confusion. The correct pattern should reject > > all strings except those in which the first sequence of the > > letter 'a' that is followed by the letter 'b' has a length of > > exactly three. > > Ah...a little more clear. > > r = re.compile("[^a]*a{3}b+(a+b*)*") > matches = [s for s in listOfStringsToTest if r.match(s)] Unfortunately, the OP's spec is even more complex than this, if we are to take to the letter what you just quoted; e.g. aazaaab SHOULD match, because the sequence 'aaz' (being 'a' NOT followed by the letter 'b') should not invalidate the match that follows. I don't think he means the strings contain only a's and b's. Locating 'the first sequence of a followed by b' is easy, and reasonably easy to check the sequence is exactly of length 3 (e.g. with a negative lookbehind) -- but I don't know how to tell a RE to *stop* searching for more if the check fails. If a little more than just REs and matching was allowed, it would be reasonably easy, but I don't know how to fashion a RE r such that r.match(s) will succeed if and only if s meets those very precise and complicated specs. That doesn't mean it just can't be done, just that I can't do it so far. Perhaps the OP can tell us what constrains him to use r.match ONLY, rather than a little bit of logic around it, so we can see if we're trying to work in an artificially overconstrained domain? Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
> Sorry for the confusion. The correct pattern should reject > all strings except those in which the first sequence of the > letter 'a' that is followed by the letter 'b' has a length of > exactly three. Ah...a little more clear. r = re.compile("[^a]*a{3}b+(a+b*)*") matches = [s for s in listOfStringsToTest if r.match(s)] or (as you've only got 3 of 'em) r = re.compile("[^a]*aaab+(a+b*)*") matches = [s for s in listOfStringsToTest if r.match(s)] should do the trick. To exposit: [^a]* a bunch of stuff that's not "a" a{3} or aaa three letter "a"s b+ one or more "b"s (a+b*) any number of "a"s followed optionally by "b"s Hope this helps, -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Sybren Stuvel" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Roger L. Cauvin enlightened us with: >> I'm looking for a regular expression that matches the first, and >> only the first, sequence of the letter 'a', and only if the length >> of the sequence is exactly 3. > > Your request is ambiguous: > > 1) You're looking for the first, and only the first, sequence of the > letter 'a'. If the length of this first, and only the first, > sequence of the letter 'a' is not 3, no match is made at all. > > 2) You're looking for the first, and only the first, sequence of > length 3 of the letter 'a'. > > What is it? The first option describes what I want, with the additional restriction that the "first sequence of the letter 'a'" is defined as 1 or more consecutive occurrences of the letter 'a', followed directly by the letter 'b'. -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Hello Roger, > since the length of the first sequence of the letter 'a' is 2. Yours > accepts it, right? Yes, i misunderstood your requirements. So it must be modified essentially to that what Tim Chase wrote: m = re.search('^[^a]*a{3}b', 'xyz123aabbaaab') Best wishes from germany, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin" <[EMAIL PROTECTED]> might have written: >Say I have some string that begins with an arbitrary sequence of characters >and then alternates repeating the letters 'a' and 'b' any number of times, >e.g. > >"xyz123aaabbaaabaaaabb" > >I'm looking for a regular expression that matches the first, and only the >first, sequence of the letter 'a', and only if the length of the sequence is >exactly 3. > >Does such a regular expression exist? If so, any ideas as to what it could >be? Is this what you mean? ^[^a]*(a{3})(?:[^a].*)?$ This fits your description. -- TZOTZIOY, I speak England very best. "Dear Paul, please stop spamming us." The Corinthians -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Alex Martelli" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Tim Chase <[EMAIL PROTECTED]> wrote: > ... >> I'm not quite sure what your intent here is, as the >> resulting find would obviously be "aaa", of length 3. > > But that would also match ''; I think he wants negative loobehind > and lookahead assertions around the 'aaa' part. But then there's the > spec about matching only if the sequence is the first occurrence of > 'a's, so maybe he wants '$[^a]*' instead of the lookbehind (and maybe > parentheses around the 'aaa' to somehow 'match' is specially?). > > It's definitely not very clear what exactly the intent is, no... Sorry for the confusion. The correct pattern should reject all strings except those in which the first sequence of the letter 'a' that is followed by the letter 'b' has a length of exactly three. Hope that's clearer . . . . -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
"Christoph Conrad" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hello Roger, > >> I'm looking for a regular expression that matches the first, and only >> the first, sequence of the letter 'a', and only if the length of the >> sequence is exactly 3. > > import sys, re, os > > if __name__=='__main__': > >m = re.search('a{3}', 'xyz123aaabbaaaabaaabb') >print m.group(0) >print "Preceded by: \"" + m.string[0:m.start(0)] + "\"" The correct pattern should reject the string: 'xyz123aabbaaab' since the length of the first sequence of the letter 'a' is 2. Yours accepts it, right? -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Tim Chase <[EMAIL PROTECTED]> wrote: ... > I'm not quite sure what your intent here is, as the > resulting find would obviously be "aaa", of length 3. But that would also match ''; I think he wants negative loobehind and lookahead assertions around the 'aaa' part. But then there's the spec about matching only if the sequence is the first occurrence of 'a's, so maybe he wants '$[^a]*' instead of the lookbehind (and maybe parentheses around the 'aaa' to somehow 'match' is specially?). It's definitely not very clear what exactly the intent is, no... Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
> Say I have some string that begins with an arbitrary > sequence of characters and then alternates repeating the > letters 'a' and 'b' any number of times, e.g. > "xyz123aaabbaaabaaaabb" > > I'm looking for a regular expression that matches the > first, and only the first, sequence of the letter 'a', and > only if the length of the sequence is exactly 3. > > Does such a regular expression exist? If so, any ideas as > to what it could be? > I'm not quite sure what your intent here is, as the resulting find would obviously be "aaa", of length 3. If you mean that you want to test against a number of things, and only find items where "aaa" is the first "a" on the line, you might try something like import re listOfStringsToTest = [ 'helloworld', 'xyz123aaabbaabababbab', 'cantalopeaaabababa', 'baabbbaaab', 'xyzaa123aaabbabbabababaa'] r = re.compile("[^a]*(a{3})b+(a+b+)*") matches = [s for s in listOfStringsToTest if r.match(s)] print repr(matches) If you just want the *first* triad of "aaa", you can change the regexp to r = re.compile(".*?(a{3})b+(a+b+)*") With a little more detail as to the gist of the problem, perhaps a better solution can be found. In particular, are there items in the listOfStringsToTest that should be found but aren't with either of the regexps? -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Roger L. Cauvin enlightened us with: > I'm looking for a regular expression that matches the first, and > only the first, sequence of the letter 'a', and only if the length > of the sequence is exactly 3. Your request is ambiguous: 1) You're looking for the first, and only the first, sequence of the letter 'a'. If the length of this first, and only the first, sequence of the letter 'a' is not 3, no match is made at all. 2) You're looking for the first, and only the first, sequence of length 3 of the letter 'a'. What is it? Sybren -- The problem with the world is stupidity. Not saying there should be a capital punishment for stupidity, but why don't we just take the safety labels off of everything and let the problem solve itself? Frank Zappa -- http://mail.python.org/mailman/listinfo/python-list
Re: Match First Sequence in Regular Expression?
Hello Roger, > I'm looking for a regular expression that matches the first, and only > the first, sequence of the letter 'a', and only if the length of the > sequence is exactly 3. import sys, re, os if __name__=='__main__': m = re.search('a{3}', 'xyz123aaabbaaaabaaabb') print m.group(0) print "Preceded by: \"" + m.string[0:m.start(0)] + "\"" Best wishes, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Match First Sequence in Regular Expression?
Say I have some string that begins with an arbitrary sequence of characters and then alternates repeating the letters 'a' and 'b' any number of times, e.g. "xyz123aaabbaaabaaaabb" I'm looking for a regular expression that matches the first, and only the first, sequence of the letter 'a', and only if the length of the sequence is exactly 3. Does such a regular expression exist? If so, any ideas as to what it could be? -- Roger L. Cauvin [EMAIL PROTECTED] (omit the "nospam_" part) Cauvin, Inc. Product Management / Market Research http://www.cauvin-inc.com -- http://mail.python.org/mailman/listinfo/python-list