Cheryl Sabella <cheryl.sabe...@gmail.com> added the comment:
Thank you for the report. Looking at the BeautifulSoup source, there is a comment about this scenario: # Unlike other parsers, html.parser doesn't send separate end tag # events for empty-element tags. (It's handled in # handle_startendtag, but only if the original markup looked like # <tag/>.) # # So we need to call handle_endtag() ourselves. Since we # know the start event is identical to the end event, we # don't want handle_endtag() to cross off any previous end # events for tags of this name. HTMLParser itself produces output such as: >>> class MyParser(HTMLParser): ... def handle_starttag(self, tag, attrs): ... print(f'start: {tag}') ... def handle_endtag(self, tag): ... print(f'end: {tag}') ... def handle_data(self, data): ... print(f'data: {data}') ... >>> parser = MyParser() >>> parser.feed('<p><test></p>') start: p start: test end: p My suggestion would be to try a different parser in BeautifulSoup [1] to handle this. Even if we wanted to modify HTMLParser, any such change would probably be backwards incompatible. [1] https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser ---------- nosy: +cheryl.sabella resolution: -> third party stage: -> resolved status: open -> closed _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37071> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com