Re: encoding problem

Bruno Desthuilliers Fri, 19 Dec 2008 04:25:46 -0800

digisat...@gmail.com a écrit :

The below snippet code generates UnicodeDecodeError.
#!/usr/bin/env python
#--*-- coding: utf-8 --*--
s = 'äöü'
u = unicode(s)



It seems that the system use the default encoding- ASCII to decode the
utf8 encoded string literal, and thus generates the error.


Indeed. You want:

u = unicode(s, 'utf-8') # or : u = s.decode('utf-8')

The question is why the Python interpreter use the default encoding
instead of "utf-8", which I explicitly declared in the source.

Because there's no reliable way for the interpreter to guess how what'spassed to unicode has been encoded ?


s = s.decode("utf-8").encode("latin1")
# should unicode try to use utf-8 here ?
try:
  u = unicode(s)
except UnicodeDecodeError:
  print "would have worked better with "u = unicode(s, 'latin1')"

NB : IIRC, the ascii subset is safe whatever the encoding, so I'd sayit's a sensible default...

--
http://mail.python.org/mailman/listinfo/python-list

Re: encoding problem

Reply via email to