Diez B. Roggisch wrote: > Hellmut Weber wrote: > > >> Hi, >> i'm new here in this list. >> >> i'm developing a little program using an xml document. So far it's easy >> going, but when parsing an xml document which contains the EURO symbol >> ('€') then I get an error: >> >> UnicodeEncodeError: 'charmap' codec can't encode character u'\xa4' in >> position 11834: character maps to <undefined> >> >> the relevant piece of code is: >> >> from xml.dom.minidom import Document, parse, parseString >> ... >> doc = parse(inFIleName) >> > > The contents of the file must be encoded with the proper encoding which is > given in the XML-header, or has to be utf-8 if no header is given. > > From the above I think you have a latin1-based document. Does the encoding > header match? If the file is declared as latin-1 and contains an euro symbol, then the file is actually invalid since euro is not defined of in iso-8859-1. If there is no encoding declaration, as Diez already said, the file should be encoded as utf-8.
Try replacing or adding the encoding with latin-15 (or iso-8859-15) which is the same as latin-1 with a few changes, including the euro symbol: <?xml version="1.0" encoding="iso-8859-15" ?> If your file has lot of strange diacritics, you might take a look on the little differences between latin-1 and latin-15 in order to make sure that your file won't be broken: http://en.wikipedia.org/wiki/ISO_8859-15 Cheers, RB -- http://mail.python.org/mailman/listinfo/python-list