Hi group,

I'm wrapping up a command line util that returns xml in Python.  The
util is flaky, and gives me back poorly formed xml with different
problems in different cases.  Anyway I'm making progress.  I'm not
very good at regular expressions though and was wondering if someone
could help with initially splitting the tags from the stdout returned
from the util.

I have the following example string, and am simply trying to split it
into two xml tags...

simplified = """2007-12-13 <tag1 attr1="text1" attr2="text2" /tag1>
\n2007-12-13 <tag2 attr1="text1" attr2="text2" attr3="text3\n" /tag2>
\n"""

Basically I want the two tags, and to discard anything in between
using a reg exp.  Like this:

tags = ["<tag1 attr1="text1" attr2="text2" /tag1>", "<tag2
attr1="text1" attr2="text2" attr3="text3\n" /tag2>"]

I've tried several approaches, some of which got close, but the
newline in the middle of one of the tags screwed it up.  The closest
I've been is something like this:

retag = re.compile(r'<.+>*') # tried here with re.DOTALL as well
tags = re.findall(retag)

Can anyone help me?

~Sean

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to