On Thu, May 3, 2012 at 1:59 PM, John Nagle <na...@animats.com> wrote: > An HTML page for a major site (http://www.chase.com) has > some incorrect HTML. It contains > > <![CDATA[]] > > which is not valid HTML, XML, or SMGL. However, most browsers > ignore it. BeautifulSoup treats it as the start of a CDATA section, > and consumes the rest of the document in CDATA format. > > Bug?
Seems like a bug to me. BeautifulSoup is supposed to parse like a browser would, so if most browsers just ignore an unterminated CDATA section, then BeautifulSoup probably should too. -- http://mail.python.org/mailman/listinfo/python-list