On Dec 14, 12:04 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Thu, 13 Dec 2007 17:49:20 -0800, Sean DiZazzo wrote: > > I'm wrapping up a command line util that returns xml in Python. The > > util is flaky, and gives me back poorly formed xml with different > > problems in different cases. Anyway I'm making progress. I'm not > > very good at regular expressions though and was wondering if someone > > could help with initially splitting the tags from the stdout returned > > from the util. > > > [...] > > > Can anyone help me? > > Flaky XML is often produced by programs that treat XML as ordinary text > files. If you are starting to parse XML with regular expressions you are > making the very same mistake. XML may look somewhat simple but > producing correct XML and parsing it isn't. Sooner or later you stumble > across something that breaks producing or parsing the "naive" way. > > Ciao, > Marc 'BlackJack' Rintsch
It's not really complicated xml so far, just tags with attributes. Still, using different queries against the program sometimes offers differing results...a few examples: <id 123456 /> <tag name="foo" /> <tag2 name="foo" moreattrs="..." /tag2> <tag3 name="foo" moreattrs="..." tag3/> It's consistent (at least) in that consistent queries always return consistent tag styles. It's returned to stdout with some extra useless information, so the original question was to help get to just the tags. After getting the tags, I'm running them through some functions to fix them, and then using elementtree to parse them and get all the rest of the info. There is no api, so this is what I have to work with. Is there a better solution? Thanks for your ideas. ~Sean -- http://mail.python.org/mailman/listinfo/python-list