I'm trying to find the contents of an XML tag. Nothing fancy. I don't care about parsing child tags or anything. I just want to get the raw text. Here's my script:
import re data = """ <?xml version='1.0'?> <body> <div class='default'> here's some text! </div> <div class='default'> here's some text! </div> <div class='default'> here's some text! </div> </body> """ tagName = 'div' pattern = re.compile('<%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*[^(% (tagName)s)]*' % dict(tagName=tagName)) matches = pattern.finditer(data) for m in matches: contents = data[m.start():m.end()] print repr(contents) assert tagName not in contents The problem I'm running into is that the [^%(tagName)s]* portion of my regex is being ignored, so only one match is being returned, starting at the first <div> and ending at the end of the text, when it should end at the first </div>. For this example, it should return three matches, one for each div. Is what I'm trying to do possible with Python's Regex library? Is there an error in my Regex? Thanks, Chris -- http://mail.python.org/mailman/listinfo/python-list