Hi, I need to parse a fairly complex HTML page that has XML embedded in it. I've done parsing before with the xml.dom.minidom module on just plain XML, but I cannot get it to work with this HTML page.
The XML looks like this: <Row status="o"> <Relationship>Owner</Relationship> <Priority>1</Priority> <StartDate>07/16/2007</StartDate> <StopsExist>No</StopsExist> <Name>Doe, John</Name> <Address>1905 S 3rd Ave , Hicksville IA 99999</Address> </Row> <Row status="o"> <Relationship>Owner</Relationship> <Priority>2</Priority> <StartDate>07/16/2007</StartDate> <StopsExist>No</StopsExist> <Name>Doe, Jane</Name> <Address>1905 S 3rd Ave , Hicksville IA 99999</Address> </Row> It appears to be enclosed with <XML id="grdRegistrationInquiryCustomers"><BoundData> The rest of the document is html, javascript div tags, etc. I need the information only from the row where the Relationship tag = Owner and the Priority tag = 1. The rest I can ignore. When I tried parsing it with minidom, I get an ExpatError: mismatched tag: line 1, column 357 so I think the HTML is probably malformed. I looked at BeautifulSoup, but it seems to separate its HTML processing from its XML processing. Can someone give me some pointers? I am currently using Python 2.5 on Windows XP. I will be using Internet Explorer 6 since the document will not display correctly in Firefox. Thank you very much! Mike -- http://mail.python.org/mailman/listinfo/python-list