In trying to parse html files using ElementTree running under Python 3.0a1, and using htmlentitydefs.py to add "character entities" to the parser, I found that I needed to create a customized version of htmlentitydefs.py to make things work properly.
The change needed was to replace (at the bottom of the file) ==== for (name, codepoint) in name2codepoint.items(): codepoint2name[codepoint] = name if codepoint <= 0xff: entitydefs[name] = chr(codepoint) else: entitydefs[name] = '&#%d;' % codepoint ==== by ---- for (name, codepoint) in name2codepoint.items(): codepoint2name[codepoint] = name entitydefs[name] = chr(codepoint) ---- It does work for me ... but I don't know enough about unicode to be sure that it is a proper bug, and not a quirk due to the way I wrote my app. So, I thought it would be appropriate to post it here so that unicode experts could determine if it was indeed a bug - and file a bug report/write a patch. The same code is present in Python 3.0a2 - but I have not tested it under this new version. André -- http://mail.python.org/mailman/listinfo/python-list