digisat...@gmail.com a écrit :
The below snippet code generates UnicodeDecodeError.
#!/usr/bin/env python
#--*-- coding: utf-8 --*--
s = 'äöü'
u = unicode(s)


It seems that the system use the default encoding- ASCII to decode the
utf8 encoded string literal, and thus generates the error.

Indeed. You want:

u = unicode(s, 'utf-8') # or : u = s.decode('utf-8')

The question is why the Python interpreter use the default encoding
instead of "utf-8", which I explicitly declared in the source.

Because there's no reliable way for the interpreter to guess how what's passed to unicode has been encoded ?

s = s.decode("utf-8").encode("latin1")
# should unicode try to use utf-8 here ?
try:
  u = unicode(s)
except UnicodeDecodeError:
  print "would have worked better with "u = unicode(s, 'latin1')"


NB : IIRC, the ascii subset is safe whatever the encoding, so I'd say it's a sensible default...
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to