Re: Regular Expression bug?
On Thu, Mar 2, 2023 at 9:56 PM Alan Bawden wrote: > > jose isaias cabrera writes: > >On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann wrote: > >This re is a bit different than the one I am used. So, I am trying to match >everything after 'pn=': > >import re >s = "pm=jose pn=2017" >m0 = r"pn=(.+)" >r0 = re.compile(m0) >s0 = r0.match(s) >>>> print(s0) >None > > Assuming that you were expecting to match "pn=2017", then you probably > don't want the 'match' method. Read its documentation. Then read the > documentation for the _other_ methods that a Pattern supports. Then you > will be enlightened. Yes. I need search. Thanks. -- What if eternity is real? Where will you spend it? H... -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On Thu, Mar 2, 2023 at 8:35 PM wrote: > > It is a well-known fact, Jose, that GIGO. > > The letters "n" and "m" are not interchangeable. Your pattern fails because > you have "pn" in one place and "pm" in the other. It is not GIGO. pm=project manager. pn=project name. I needed search() rather than match(). > > >>> s = "pn=jose pn=2017" > ... > >>> s0 = r0.match(s) > >>> s0 > > > > > -Original Message- > From: Python-list On > Behalf Of jose isaias cabrera > Sent: Thursday, March 2, 2023 8:07 PM > To: Mats Wichmann > Cc: python-list@python.org > Subject: Re: Regular Expression bug? > > On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann wrote: > > > > On 3/2/23 12:28, Chris Angelico wrote: > > > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera > wrote: > > >> > > >> Greetings. > > >> > > >> For the RegExp Gurus, consider the following python3 code: > > >> > > >> import re > > >> s = "pn=align upgrade sd=2023-02-" > > >> ro = re.compile(r"pn=(.+) ") > > >> r0=ro.match(s) > > >>>>> print(r0.group(1)) > > >> align upgrade > > >> > > >> > > >> This is wrong. It should be 'align' because the group only goes up-to > > >> the space. Thoughts? Thanks. > > >> > > > > > > Not a bug. Find the longest possible match that fits this; as long as > > > you can find a space immediately after it, everything in between goes > > > into the .+ part. > > > > > > If you want to exclude spaces, either use [^ ]+ or .+?. > > > > https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy > > This re is a bit different than the one I am used. So, I am trying to match > everything after 'pn=': > > import re > s = "pm=jose pn=2017" > m0 = r"pn=(.+)" > r0 = re.compile(m0) > s0 = r0.match(s) > >>> print(s0) > None > > Any help is appreciated. > -- > https://mail.python.org/mailman/listinfo/python-list > -- What if eternity is real? Where will you spend it? H... -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On Thu, Mar 2, 2023 at 8:30 PM Cameron Simpson wrote: > > On 02Mar2023 20:06, jose isaias cabrera wrote: > >This re is a bit different than the one I am used. So, I am trying to > >match > >everything after 'pn=': > > > >import re > >s = "pm=jose pn=2017" > >m0 = r"pn=(.+)" > >r0 = re.compile(m0) > >s0 = r0.match(s) > > `match()` matches at the start of the string. You want r0.search(s). > - Cameron Simpson Thanks. Darn it! I knew it was something simple. -- What if eternity is real? Where will you spend it? H... -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
jose isaias cabrera writes: On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann wrote: This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=': import re s = "pm=jose pn=2017" m0 = r"pn=(.+)" r0 = re.compile(m0) s0 = r0.match(s) >>> print(s0) None Assuming that you were expecting to match "pn=2017", then you probably don't want the 'match' method. Read its documentation. Then read the documentation for the _other_ methods that a Pattern supports. Then you will be enlightened. - Alan -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On 02Mar2023 20:06, jose isaias cabrera wrote: This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=': import re s = "pm=jose pn=2017" m0 = r"pn=(.+)" r0 = re.compile(m0) s0 = r0.match(s) `match()` matches at the start of the string. You want r0.search(s). - Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
RE: Regular Expression bug?
It is a well-known fact, Jose, that GIGO. The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other. >>> s = "pn=jose pn=2017" ... >>> s0 = r0.match(s) >>> s0 -Original Message- From: Python-list On Behalf Of jose isaias cabrera Sent: Thursday, March 2, 2023 8:07 PM To: Mats Wichmann Cc: python-list@python.org Subject: Re: Regular Expression bug? On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann wrote: > > On 3/2/23 12:28, Chris Angelico wrote: > > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera wrote: > >> > >> Greetings. > >> > >> For the RegExp Gurus, consider the following python3 code: > >> > >> import re > >> s = "pn=align upgrade sd=2023-02-" > >> ro = re.compile(r"pn=(.+) ") > >> r0=ro.match(s) > >>>>> print(r0.group(1)) > >> align upgrade > >> > >> > >> This is wrong. It should be 'align' because the group only goes up-to > >> the space. Thoughts? Thanks. > >> > > > > Not a bug. Find the longest possible match that fits this; as long as > > you can find a space immediately after it, everything in between goes > > into the .+ part. > > > > If you want to exclude spaces, either use [^ ]+ or .+?. > > https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=': import re s = "pm=jose pn=2017" m0 = r"pn=(.+)" r0 = re.compile(m0) s0 = r0.match(s) >>> print(s0) None Any help is appreciated. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann wrote: > > On 3/2/23 12:28, Chris Angelico wrote: > > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera wrote: > >> > >> Greetings. > >> > >> For the RegExp Gurus, consider the following python3 code: > >> > >> import re > >> s = "pn=align upgrade sd=2023-02-" > >> ro = re.compile(r"pn=(.+) ") > >> r0=ro.match(s) > > print(r0.group(1)) > >> align upgrade > >> > >> > >> This is wrong. It should be 'align' because the group only goes up-to > >> the space. Thoughts? Thanks. > >> > > > > Not a bug. Find the longest possible match that fits this; as long as > > you can find a space immediately after it, everything in between goes > > into the .+ part. > > > > If you want to exclude spaces, either use [^ ]+ or .+?. > > https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy This re is a bit different than the one I am used. So, I am trying to match everything after 'pn=': import re s = "pm=jose pn=2017" m0 = r"pn=(.+)" r0 = re.compile(m0) s0 = r0.match(s) >>> print(s0) None Any help is appreciated. -- https://mail.python.org/mailman/listinfo/python-list
RE: Regular Expression bug?
José, Matching can be greedy. Did it match to the last space? What you want is a pattern that matches anything except a space (or whitespace) followed b matching a space or something similar. Or use a construct that makes matching non-greedy. Avi -Original Message- From: Python-list On Behalf Of jose isaias cabrera Sent: Thursday, March 2, 2023 2:23 PM To: python-list@python.org Subject: Regular Expression bug? Greetings. For the RegExp Gurus, consider the following python3 code: import re s = "pn=align upgrade sd=2023-02-" ro = re.compile(r"pn=(.+) ") r0=ro.match(s) >>> print(r0.group(1)) align upgrade This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks. josé -- What if eternity is real? Where will you spend it? H... -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On Thu, Mar 2, 2023 at 2:32 PM <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2023-03-02 at 14:22:41 -0500, > jose isaias cabrera wrote: > > > For the RegExp Gurus, consider the following python3 code: > > > > import re > > s = "pn=align upgrade sd=2023-02-" > > ro = re.compile(r"pn=(.+) ") > > r0=ro.match(s) > > >>> print(r0.group(1)) > > align upgrade > > > > > > This is wrong. It should be 'align' because the group only goes up-to > > the space. Thoughts? Thanks. > > The bug is in your regular expression; the plus modifier is greedy. > > If you want to match up to the first space, then you'll need something > like [^ ] (i.e., everything that isn't a space) instead of that dot. Thanks. I appreciate your wisdom. josé -- What if eternity is real? Where will you spend it? H... -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On 3/2/23 12:28, Chris Angelico wrote: On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera wrote: Greetings. For the RegExp Gurus, consider the following python3 code: import re s = "pn=align upgrade sd=2023-02-" ro = re.compile(r"pn=(.+) ") r0=ro.match(s) print(r0.group(1)) align upgrade This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks. Not a bug. Find the longest possible match that fits this; as long as you can find a space immediately after it, everything in between goes into the .+ part. If you want to exclude spaces, either use [^ ]+ or .+?. https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On 2023-03-02 at 14:22:41 -0500, jose isaias cabrera wrote: > For the RegExp Gurus, consider the following python3 code: > > import re > s = "pn=align upgrade sd=2023-02-" > ro = re.compile(r"pn=(.+) ") > r0=ro.match(s) > >>> print(r0.group(1)) > align upgrade > > > This is wrong. It should be 'align' because the group only goes up-to > the space. Thoughts? Thanks. The bug is in your regular expression; the plus modifier is greedy. If you want to match up to the first space, then you'll need something like [^ ] (i.e., everything that isn't a space) instead of that dot. -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug?
On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera wrote: > > Greetings. > > For the RegExp Gurus, consider the following python3 code: > > import re > s = "pn=align upgrade sd=2023-02-" > ro = re.compile(r"pn=(.+) ") > r0=ro.match(s) > >>> print(r0.group(1)) > align upgrade > > > This is wrong. It should be 'align' because the group only goes up-to > the space. Thoughts? Thanks. > Not a bug. Find the longest possible match that fits this; as long as you can find a space immediately after it, everything in between goes into the .+ part. If you want to exclude spaces, either use [^ ]+ or .+?. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Regular Expression bug?
Greetings. For the RegExp Gurus, consider the following python3 code: import re s = "pn=align upgrade sd=2023-02-" ro = re.compile(r"pn=(.+) ") r0=ro.match(s) >>> print(r0.group(1)) align upgrade This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks. josé -- What if eternity is real? Where will you spend it? H... -- https://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
More elegant way >>> [x for x in re.split('([A-Z]+[a-z]+)', a) if x ] ['foo', 'Bar', 'Baz'] R. On Feb 20, 2:03 pm, Lie Ryan wrote: > On Thu, 19 Feb 2009 13:03:59 -0800, Ron Garret wrote: > > In article , > > Peter Otten <__pete...@web.de> wrote: > > >> Ron Garret wrote: > > >> > I'm trying to split a CamelCase string into its constituent > >> > components. > > >> How about > > >> >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz") > >> ['foo', 'Bar', 'Baz'] > > > That's very clever. Thanks! > > >> > (BTW, I tried looking at the source code for the re module, but I > >> > could not find the relevant code. re.split calls > >> > sre_compile.compile().split, but the string 'split' does not appear > >> > in sre_compile.py. So where does this method come from?) > > >> It's coded in C. The source is Modules/sremodule.c. > > > Ah. Thanks! > > > rg > > This re.split() doesn't consume character: > > >>> re.split('([A-Z][a-z]*)', 'fooBarBaz') > > ['foo', 'Bar', '', 'Baz', ''] > > it does what the OP wants, albeit with extra blank strings. -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
On Thu, 19 Feb 2009 13:03:59 -0800, Ron Garret wrote: > In article , > Peter Otten <__pete...@web.de> wrote: > >> Ron Garret wrote: >> >> > I'm trying to split a CamelCase string into its constituent >> > components. >> >> How about >> >> >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz") >> ['foo', 'Bar', 'Baz'] > > That's very clever. Thanks! > >> > (BTW, I tried looking at the source code for the re module, but I >> > could not find the relevant code. re.split calls >> > sre_compile.compile().split, but the string 'split' does not appear >> > in sre_compile.py. So where does this method come from?) >> >> It's coded in C. The source is Modules/sremodule.c. > > Ah. Thanks! > > rg This re.split() doesn't consume character: >>> re.split('([A-Z][a-z]*)', 'fooBarBaz') ['foo', 'Bar', '', 'Baz', ''] it does what the OP wants, albeit with extra blank strings. -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
andrew cooke wrote: > > i wonder what fraction of people posting with "bug?" in their titles here > actually find bugs? About 99.99%. Unfortunately, 99.98% have found bugs in their code, not in Python. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
In article , Albert Hopkins wrote: > On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote: > > I'm trying to split a CamelCase string into its constituent components. > > This kind of works: > > > > >>> re.split('[a-z][A-Z]', 'fooBarBaz') > > ['fo', 'a', 'az'] > > > > but it consumes the boundary characters. To fix this I tried using > > lookahead and lookbehind patterns instead, but it doesn't work: > > That's how re.split works, same as str.split... I think one could make the argument that 'foo'.split('') ought to return ['f','o','o'] > > > >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > > ['fooBarBaz'] > > > > However, it does seem to work with findall: > > > > >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > > ['', ''] > > > Wow! > > To tell you the truth, I can't even read that... It's a regexp. Of course you can't read it. ;-) rg -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
In article , "andrew cooke" wrote: > i wonder what fraction of people posting with "bug?" in their titles here > actually find bugs? IMHO it ought to be an invariant that len(r.split(s)) should always be one more than len(r.findall(s)). > anyway, how about: > > re.findall('[A-Z]?[a-z]*', 'fooBarBaz') > > or > > re.findall('([A-Z][a-z]*|[a-z]+)', 'fooBarBaz') That will do it. Thanks! rg -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
In article , Peter Otten <__pete...@web.de> wrote: > Ron Garret wrote: > > > I'm trying to split a CamelCase string into its constituent components. > > How about > > >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz") > ['foo', 'Bar', 'Baz'] That's very clever. Thanks! > > (BTW, I tried looking at the source code for the re module, but I could > > not find the relevant code. re.split calls sre_compile.compile().split, > > but the string 'split' does not appear in sre_compile.py. So where does > > this method come from?) > > It's coded in C. The source is Modules/sremodule.c. Ah. Thanks! rg -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
In article , MRAB wrote: > Ron Garret wrote: > > I'm trying to split a CamelCase string into its constituent components. > > This kind of works: > > > re.split('[a-z][A-Z]', 'fooBarBaz') > > ['fo', 'a', 'az'] > > > > but it consumes the boundary characters. To fix this I tried using > > lookahead and lookbehind patterns instead, but it doesn't work: > > > re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > > ['fooBarBaz'] > > > > However, it does seem to work with findall: > > > re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > > ['', ''] > > > > So the regular expression seems to be doing the Right Thing. Is this a > > bug in re.split, or am I missing something? > > > > (BTW, I tried looking at the source code for the re module, but I could > > not find the relevant code. re.split calls sre_compile.compile().split, > > but the string 'split' does not appear in sre_compile.py. So where does > > this method come from?) > > > > I'm using Python2.5. > > > I, amongst others, think it's a bug (or 'misfeature'); Guido thinks it > might be intentional, but changing it could break some existing code. That seems unlikely. It would only break where people had code invoking re.split on empty matches, which at the moment is essentially a no-op. It's hard to imagine there's a lot of code like that around. What would be the point? > You could do this instead: > > >>> re.sub('(?<=[a-z])(?=[A-Z])', '@', 'fooBarBaz').split('@') > ['foo', 'Bar', 'Baz'] Blech! ;-) But thanks for the suggestion. rg -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
Ron Garret wrote: I'm trying to split a CamelCase string into its constituent components. This kind of works: re.split('[a-z][A-Z]', 'fooBarBaz') ['fo', 'a', 'az'] but it consumes the boundary characters. To fix this I tried using lookahead and lookbehind patterns instead, but it doesn't work: re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') ['fooBarBaz'] However, it does seem to work with findall: re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') ['', ''] So the regular expression seems to be doing the Right Thing. Is this a bug in re.split, or am I missing something? (BTW, I tried looking at the source code for the re module, but I could not find the relevant code. re.split calls sre_compile.compile().split, but the string 'split' does not appear in sre_compile.py. So where does this method come from?) I'm using Python2.5. I, amongst others, think it's a bug (or 'misfeature'); Guido thinks it might be intentional, but changing it could break some existing code. You could do this instead: >>> re.sub('(?<=[a-z])(?=[A-Z])', '@', 'fooBarBaz').split('@') ['foo', 'Bar', 'Baz'] -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
Ron Garret wrote: > I'm trying to split a CamelCase string into its constituent components. How about >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz") ['foo', 'Bar', 'Baz'] > This kind of works: > re.split('[a-z][A-Z]', 'fooBarBaz') > ['fo', 'a', 'az'] > > but it consumes the boundary characters. To fix this I tried using > lookahead and lookbehind patterns instead, but it doesn't work: > re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > ['fooBarBaz'] > > However, it does seem to work with findall: > re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > ['', ''] > > So the regular expression seems to be doing the Right Thing. Is this a > bug in re.split, or am I missing something? IRC the split pattern must consume at least one character, but I can't find the reference. > (BTW, I tried looking at the source code for the re module, but I could > not find the relevant code. re.split calls sre_compile.compile().split, > but the string 'split' does not appear in sre_compile.py. So where does > this method come from?) It's coded in C. The source is Modules/sremodule.c. Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
i wonder what fraction of people posting with "bug?" in their titles here actually find bugs? anyway, how about: re.findall('[A-Z]?[a-z]*', 'fooBarBaz') or re.findall('([A-Z][a-z]*|[a-z]+)', 'fooBarBaz') (you have to specify what you're matching and lookahead/back doesn't do that). andrew Ron Garret wrote: > I'm trying to split a CamelCase string into its constituent components. > This kind of works: > re.split('[a-z][A-Z]', 'fooBarBaz') > ['fo', 'a', 'az'] > > but it consumes the boundary characters. To fix this I tried using > lookahead and lookbehind patterns instead, but it doesn't work: > re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > ['fooBarBaz'] > > However, it does seem to work with findall: > re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > ['', ''] > > So the regular expression seems to be doing the Right Thing. Is this a > bug in re.split, or am I missing something? > > (BTW, I tried looking at the source code for the re module, but I could > not find the relevant code. re.split calls sre_compile.compile().split, > but the string 'split' does not appear in sre_compile.py. So where does > this method come from?) > > I'm using Python2.5. > > Thanks, > rg > -- > http://mail.python.org/mailman/listinfo/python-list > > -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
On Thu, Feb 19, 2009 at 12:55 PM, Ron Garret wrote: > I'm trying to split a CamelCase string into its constituent components. > This kind of works: > re.split('[a-z][A-Z]', 'fooBarBaz') > ['fo', 'a', 'az'] > > but it consumes the boundary characters. To fix this I tried using > lookahead and lookbehind patterns instead, but it doesn't work: > re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > ['fooBarBaz'] > > However, it does seem to work with findall: > re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > ['', ''] > > So the regular expression seems to be doing the Right Thing. Is this a > bug in re.split, or am I missing something? >From what I can tell, re.split can't split on zero-length boundaries. It needs something to split on, like str.split. Is this a bug? Possibly. The docs for re.split say: Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. Note that it does not say that zero-length matches won't work. I can work around the problem thusly: re.sub(r'(?<=[a-z])(?=[A-Z])', '_', 'fooBarBaz').split('_') Which is ugly. I reckon you can use re.findall with a pattern that matches the components and not the boundaries, but you have to take care of the beginning and end as special cases. Kurt -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug?
On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote: > I'm trying to split a CamelCase string into its constituent components. > This kind of works: > > >>> re.split('[a-z][A-Z]', 'fooBarBaz') > ['fo', 'a', 'az'] > > but it consumes the boundary characters. To fix this I tried using > lookahead and lookbehind patterns instead, but it doesn't work: That's how re.split works, same as str.split... > >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > ['fooBarBaz'] > > However, it does seem to work with findall: > > >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > ['', ''] Wow! To tell you the truth, I can't even read that... but one wonders why don't you just do def ccsplit(s): cclist = [] current_word = '' for char in s: if char in string.uppercase: if current_word: cclist.append(current_word) current_word = char else: current_word += char if current_word: ccl.append(current_word) return cclist >>> ccsplit('fooBarBaz') --> ['foo', 'Bar', 'Baz'] This is arguably *much* more easy to read than the re example doesn't require one to look ahead in the string. -a -- http://mail.python.org/mailman/listinfo/python-list
Regular expression bug?
I'm trying to split a CamelCase string into its constituent components. This kind of works: >>> re.split('[a-z][A-Z]', 'fooBarBaz') ['fo', 'a', 'az'] but it consumes the boundary characters. To fix this I tried using lookahead and lookbehind patterns instead, but it doesn't work: >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') ['fooBarBaz'] However, it does seem to work with findall: >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') ['', ''] So the regular expression seems to be doing the Right Thing. Is this a bug in re.split, or am I missing something? (BTW, I tried looking at the source code for the re module, but I could not find the relevant code. re.split calls sre_compile.compile().split, but the string 'split' does not appear in sre_compile.py. So where does this method come from?) I'm using Python2.5. Thanks, rg -- http://mail.python.org/mailman/listinfo/python-list