On Thu, May 3, 2012 at 1:59 PM, John Nagle <na...@animats.com> wrote:
>  An HTML page for a major site (http://www.chase.com) has
> some incorrect HTML.  It contains
>
>        <![CDATA[]]
>
> which is not valid HTML, XML, or SMGL.  However, most browsers
> ignore it.  BeautifulSoup treats it as the start of a CDATA section,
> and consumes the rest of the document in CDATA format.
>
>  Bug?

Seems like a bug to me.  BeautifulSoup is supposed to parse like a
browser would, so if most browsers just ignore an unterminated CDATA
section, then BeautifulSoup probably should too.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to