Ron Garret wrote: > I'm trying to split a CamelCase string into its constituent components.
How about >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz") ['foo', 'Bar', 'Baz'] > This kind of works: > >>>> re.split('[a-z][A-Z]', 'fooBarBaz') > ['fo', 'a', 'az'] > > but it consumes the boundary characters. To fix this I tried using > lookahead and lookbehind patterns instead, but it doesn't work: > >>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > ['fooBarBaz'] > > However, it does seem to work with findall: > >>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > ['', ''] > > So the regular expression seems to be doing the Right Thing. Is this a > bug in re.split, or am I missing something? IRC the split pattern must consume at least one character, but I can't find the reference. > (BTW, I tried looking at the source code for the re module, but I could > not find the relevant code. re.split calls sre_compile.compile().split, > but the string 'split' does not appear in sre_compile.py. So where does > this method come from?) It's coded in C. The source is Modules/sremodule.c. Peter -- http://mail.python.org/mailman/listinfo/python-list