Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:

The stdlib HTML parser requires correct HTML.

To parse broken HTML, as you find in the real world, you need a third-party 
library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times 
as many LOC) but can handle nearly anything a browser can.

I doubt the stdlib will ever compete with BeautifulSoup.

----------
nosy: +steven.daprano

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to