> Without an additional parser, I was getting the following error > message: [...] > xml.parsers.expat.ExpatError: undefined entity é: line 401, column 11
To understand that problem better, it would have been helpful to see what line 401, column 11 of the input file actually says. AFAICT, it must have been something like "&é;" which would be really puzzling to have in an XML file (usually, people restrict themselves to ASCII for entity names). > for entity in ent: > if entity not in parser.entity: > parser.entity[entity] = ent[entity] This looks fine to me. > The output was "wrong". For example, one of the test I used was to > process a copy of the main dict of htmlentitydefs.py inside an html page. A > few of the characters came ok, but I got things like: > > 'Α': 0x0391, # greek capital letter alpha, U+0391 Why do you think this is wrong? > When using my modified version, I got the following (which may not be > transmitted properly by email...) > 'Α': 0x0391, # greek capital letter alpha, U+0391 > > It does look like a Greek capital letter alpha here. Sure, however, your first version ALSO has the Greek capital letter alpha there; it is just spelled as Α (which *is* a valid spelling for that latter in XML). > I hope the above is of some help. Thanks; I now think that htmlentitydefs is just as fine as it always was - I don't see any problem here. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list