I have been trying to write a regular expression that identifies a block of text enclosed by (potentially nested) parentheses. I've found solutions using other regular expression engines (for example, my text editor, BBEdit, which uses the PCRE library), but have not been able to replicate it using python's re module.
Here's a version that works using the PCRE syntax, along with the python error message. I'm hoping for this to identify the string '(foo (bar) (baz))' % python -V Python 2.5.1 % python py> import re py> text = 'buh (foo (bar) (baz)) blee' py> no_ws = lambda s: ''.join(s.split()) py> rexp = r"""(?P<parens> ... \( ... (?> ... (?> [^()]+ ) | ... (?P>parens) ... )* ... \) ... )""" py> print re.findall(no_ws(rexp), text) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/re.py", line 167, in findall return _compile(pattern, flags).findall(string) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/re.py", line 233, in _compile raise error, v # invalid expression sre_constants.error: unexpected end of pattern >From what I understand of the PCRE syntax, the (?>) construct is a non- capturing subpattern, and (?P>parens) is a recursive call to the enclosing (named) pattern. So my best guess at a python equivalent is this: py> rexp2 = r"""(?P<parens> ... \( ... (?= ... (?= [^()]+ ) | ... (?P=parens) ... )* ... \) ... )""" py> print re.findall(no_ws(rexp2), text) [] ...which results in no match. I've played around quite a bit with variations on this theme, but haven't been able to come up with one that works. Can anyone help me understand how to construct a regular expression that does the job in python? Thanks - -- http://mail.python.org/mailman/listinfo/python-list