Thomas Armstrong wrote: > I'm trying to parse a UTF-8 document with special characters like > acute-accent vowels: > -------- > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > ... > ------- > > But I get this error message: > ------- > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in > position 122: ordinal not in range(128) > -------
> It works, but I don't want to substitute each special character, because there > are always forgotten ones which can crack the program. if you really want to use latin-1 in the database, and you don't mind dropping unsupported characters, you can use text_extrated = text_extrated.encode('iso-8859-1', 'replace') or text_extrated = text_extrated.encode('iso-8859-1', 'ignore') a better approach is of course to convert your database to use UTF-8 and use text_extrated = text_extrated.encode('utf-8') it's also a good idea to switch to parameter substitution in your SQL queries: cursor.execute ("update ... set text = %s where id = %s", text_extrated, id) it's possible that your database layer can automatically encode unicode strings if you pass them in as parameters; see the database API documentation for details. </F> -- http://mail.python.org/mailman/listinfo/python-list