On Fri, May 4, 2012 at 12:57 AM, Stefan Behnel <stefan...@behnel.de> wrote: > Ian Kelly, 04.05.2012 01:02: >> BeautifulSoup is supposed to parse like a browser would > > Not at all, that would be html5lib.
Well, I guess that depends on whether we're talking about BeautifulSoup 3 (a regex-based screen scraper with methods for navigating and searching the resulting tree) or 4 (purely a parse tree navigation library that relies on external libraries to do the actual parsing). According to the BS3 documentation, "The BeautifulSoup class is full of web-browser-like heuristics for divining the intent of HTML authors." If we're talking about BS4, though, then the problem in this instance would have nothing to do with BS4 and instead would be an issue of whatever underlying parser the OP is using. -- http://mail.python.org/mailman/listinfo/python-list