[issue23144] html.parser.HTMLParser: setting 'convert_charrefs = True' leads to dropped text

Ross Thu, 01 Jan 2015 10:47:44 -0800

New submission from Ross:

If convert_charrefs is set to true the final data section is not return by 
feed(). It is held until the next tag is encountered.


---
from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self, convert_charrefs=True)
        self.fed = []
    def handle_starttag(self, tag, attrs):
        print("Encountered a start tag:", tag)
    def handle_endtag(self, tag):
        print("Encountered an end tag :", tag)
    def handle_data(self, data):
        print("Encountered some data  :", data)

parser = MyHTMLParser()

parser.feed("foo <a>link</a> bar")
print("")
parser.feed("spam <a>link</a> eggs")

---

gives

Encountered some data  : foo 
Encountered a start tag: a
Encountered some data  : link
Encountered an end tag : a

Encountered some data  :  barspam 
Encountered a start tag: a
Encountered some data  : link
Encountered an end tag : a


With 'convert_charrefs = False' it works as expected.

----------
components: Library (Lib)
messages: 233291
nosy: xkjq
priority: normal
severity: normal
status: open
title: html.parser.HTMLParser: setting 'convert_charrefs = True' leads to 
dropped text
type: behavior
versions: Python 3.4

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23144>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23144] html.parser.HTMLParser: setting 'convert_charrefs = True' leads to dropped text

Reply via email to