New submission from Hanno Boeck <ha...@hboeck.de>: I noticed that the HTMLParser will raise an exception on some inputs. I'm not sure what the expectations here are, but given that real-world HTML often contains all kinds of broken content I would assume an HTMLParser to always try to parse a document and not be interrupted by an exception if an error occurs.
Here's a minified example: #!/usr/bin/env python3 import html.parser html.parser.HTMLParser().feed("<![\n") However I actually stepped upon HTML failing on a real webpage: https://kafanews.com/ Exception of minified example: Traceback (most recent call last): File "./foo.py", line 5, in <module> html.parser.HTMLParser().feed("<![\n") File "/usr/lib64/python3.6/html/parser.py", line 111, in feed self.goahead(0) File "/usr/lib64/python3.6/html/parser.py", line 179, in goahead k = self.parse_html_declaration(i) File "/usr/lib64/python3.6/html/parser.py", line 264, in parse_html_declaration return self.parse_marked_section(i) File "/usr/lib64/python3.6/_markupbase.py", line 149, in parse_marked_section sectName, j = self._scan_name( i+3, i ) File "/usr/lib64/python3.6/_markupbase.py", line 391, in _scan_name % rawdata[declstartpos:declstartpos+20]) File "/usr/lib64/python3.6/_markupbase.py", line 34, in error "subclasses of ParserBase must override error()") NotImplementedError: subclasses of ParserBase must override error() ---------- components: Library (Lib) messages: 312363 nosy: hanno priority: normal severity: normal status: open title: HTMLParser raises exception on some inputs type: behavior versions: Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32876> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com