Agreed that the web sites are probably broken.  Try running the HTML
though HTMLTidy (http://tidy.sourceforge.net/). Doing that has allowed
me to parse where I had problem such as yours.

I have also had luck with BeautifulSoup, which also includes a tidy
function in it.



Just Another Victim of the Ambient Morality wrote:
> "Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote
> in message news:[EMAIL PROTECTED]
> >
> >    Okay, I think I found what I'm looking for in HTMLParser in the
> > HTMLParser module.
>
>     Except it appears to be buggy or, at least, not very robust.  There are
> websites for which it falsely terminates early in the parsing.  I have a
> sneaking feeling the sgml parser will be more robust, if only it had that
> one feature I am looking for.
>     Can someone help me out here?
>     Thank you...

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to