Re: SAXParseException: not well-formed (invalid token)

Carsten Haese Thu, 30 Aug 2007 06:48:45 -0700

On Thu, 2007-08-30 at 15:20 +0200, Pablo Rey wrote:
>       Hi Stefan,
> 
>       The xml has specified an encoding (<?xml version="1.0" encoding="UTF-8" 
> ?>).


It's possible that the encoding specification is incorrect:

>>> u = u"\N{LATIN SMALL LETTER E WITH ACUTE}"
>>> print repr(u.encode("latin-1"))
'\xe9'
>>> print repr(u.encode("utf-8"))
'\xc3\xa9'

If your input string contains the byte 0xe9 where your accented e is,
the file is actually latin-1 encoded. If it contains the byte sequence
0xc3,0xa9 it is UTF-8 encoded.

If the string is encoded in latin-1, you can transcode it to utf-8 like
this:

contents = contents.decode("latin-1").encode("utf-8")

HTH,

-- 
Carsten Haese
http://informixdb.sourceforge.net


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: SAXParseException: not well-formed (invalid token)

Reply via email to