Steven D'Aprano <steve+pyt...@pearwood.info> added the comment: The stdlib HTML parser requires correct HTML.
To parse broken HTML, as you find in the real world, you need a third-party library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times as many LOC) but can handle nearly anything a browser can. I doubt the stdlib will ever compete with BeautifulSoup. ---------- nosy: +steven.daprano _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32876> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com