Bug in htmlentitydefs.py with Python 3.0?

André Wed, 26 Dec 2007 14:41:24 -0800

In trying to parse html files using ElementTree running under Python
3.0a1, and using htmlentitydefs.py to add "character entities" to the
parser, I found that I needed to create a customized version of
htmlentitydefs.py to make things work properly.


The change needed was to replace (at the bottom of the file)
====
for (name, codepoint) in name2codepoint.items():
    codepoint2name[codepoint] = name
    if codepoint <= 0xff:
        entitydefs[name] = chr(codepoint)
    else:
        entitydefs[name] = '&#%d;' % codepoint
====
by
----
for (name, codepoint) in name2codepoint.items():
    codepoint2name[codepoint] = name
    entitydefs[name] = chr(codepoint)
----

It does work for me ... but I don't know enough about unicode to be
sure that it is a proper bug, and not a quirk due to the way I wrote
my app.  So, I thought it would be appropriate to post it here so that
unicode experts could determine if it was indeed a bug - and file a
bug report/write a patch.   The same code is present in Python 3.0a2 -
but I have not tested it under this new version.

André
-- 
http://mail.python.org/mailman/listinfo/python-list

Bug in htmlentitydefs.py with Python 3.0?

Reply via email to