Re: How extract data from XHTML Transitional web pages? got xml.dom.minidom troubles..

Bruno Desthuilliers Fri, 02 Mar 2007 17:18:40 -0800

[EMAIL PROTECTED] a écrit :
> I'm trying to extract some data from an XHTML Transitional web page.
> 
> What is best way to do this?
> 
> xml.dom.minidom.


As a side note, cElementTree is probably a better choice. Or even a 
simple SAX parser.

>parseString("text of web page") gives errors about it
> not being well formed XML.

If it's not well-formed XML, most - if not all - XML parsers will shoke 
on it.

> Do I just need to add something like <?xml ...?> or what?

How could we say without looking at the XML ?

But anyway, even if the XHTML is crappy, BeautifulSoup may do the job...

HTH
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How *extract* data from XHTML Transitional web pages? got xml.dom.minidom troubles..

Reply via email to

Re: How extract data from XHTML Transitional web pages? got xml.dom.minidom troubles..