Re: convert xhtml back to html

Gary Herron Thu, 24 Apr 2008 09:13:40 -0700

Tim Arnold wrote:

hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop tocreate CHM files. That application really hates xhtml, so I need to convertself-ending tags (e.g. <br />) to plain html (e.g. <br>).
Seems simple enough, but I'm having some trouble with it. regexps trip upbecause I also have to take into account 'img', 'meta', 'link' tags, notjust the simple 'br' and 'hr' tags. Well, maybe there's a simple way to dothat with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm notenough of a regexp pro to figure out that lookahead stuff.
I'm not sure where to start now; I looked at BeautifulSoup andBeautifulStoneSoup, but I can't see how to modify the actual tag.
thanks,
--Tim Arnold


--
http://mail.python.org/mailman/listinfo/python-list

Whether or not you can find an application that does what you want, Idon't know, but at the very least I can say this much.

You should not be reading and parsing the text yourself! XHTML is validXML, and there a lots of ways to read and parse XML with Python.(ElementTree is what I use, but other choices exist.) Once you use anexisting package to read your files into an internal tree structurerepresentation, it should be a relatively easy job to traverse the treeto emit the tags and text you want.



Gary Herron

--
http://mail.python.org/mailman/listinfo/python-list

Re: convert xhtml back to html

Reply via email to