New submission from yanne <[EMAIL PROTECTED]>: It seems that HTMLParser.feed throws an exception whenever an attribute name contains both quotation mark '&' and non-ascii characters.
Running the attached test file with Python 2.5 succeeds, but with Python 2.6, the result is: C:\Python26>python.exe test.py Without & in attribute OK With & in attribute Traceback (most recent call last): File "test.py", line 18, in <module> HP().feed(s) File "C:\Python26\lib\HTMLParser.py", line 108, in feed self.goahead(0) File "C:\Python26\lib\HTMLParser.py", line 148, in goahead k = self.parse_starttag(i) File "C:\Python26\lib\HTMLParser.py", line 249, in parse_starttag attrvalue = self.unescape(attrvalue) File "C:\Python26\lib\HTMLParser.py", line 386, in unescape return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s) File "C:\Python26\lib\re.py", line 150, in sub return _compile(pattern, 0).sub(repl, string, count) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) I am running: Python 2.6rc2 (r26rc2:66507, Sep 18 2008, 14:27:33) [MSC v.1500 32 bit (Intel)] on win32 ---------- components: Library (Lib) files: test.py messages: 73571 nosy: yanne severity: normal status: open title: HTMLParser cannot handle '&' and non-ascii characters in attribute names versions: Python 2.6 Added file: http://bugs.python.org/file11557/test.py _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3932> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com