Re: Looking for a decent HTML parser for Python...

hubritic Wed, 06 Dec 2006 08:55:59 -0800

Agreed that the web sites are probably broken.  Try running the HTML
though HTMLTidy (http://tidy.sourceforge.net/). Doing that has allowed
me to parse where I had problem such as yours.


I have also had luck with BeautifulSoup, which also includes a tidy
function in it.



Just Another Victim of the Ambient Morality wrote:
> "Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote
> in message news:[EMAIL PROTECTED]
> >
> >    Okay, I think I found what I'm looking for in HTMLParser in the
> > HTMLParser module.
>
>     Except it appears to be buggy or, at least, not very robust.  There are
> websites for which it falsely terminates early in the parsing.  I have a
> sneaking feeling the sgml parser will be more robust, if only it had that
> one feature I am looking for.
>     Can someone help me out here?
>     Thank you...

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Looking for a decent HTML parser for Python...

Reply via email to