Bugs item #1459279, was opened at 2006-03-27 14:51 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1459279&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Francesco Ricciardi (nerby) Assigned to: Nobody/Anonymous (nobody) Summary: sgmllib.SGMLparser and hexadecimal numeric character refs Initial Comment: According to HTML 4.0 specification it is possible to have hexadecimal numeric character references, not only decimal (see http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1). However sgmllib.SGMLparser does not recognize the hexadecimal form. More and more HTML pages now use entities with a high codepoint, not in the official HTML entity list, so proper handling to these references should be implemented. A possible solution could be: - improving the "charref" regular expression, so to include exadecimal values; - considering all numeric references valid: those with n < 255 should be converted to the corresponding characters, those above 255 should be left as numerical charrefs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1459279&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com