First of all, if you run this on the console, find out your console's  
encoding. In my case it is English Windows XP. It uses 'cp437'.

C:\>chcp
Active code page: 437

Then

>>> s = "José"
>>> u = u"Jos\u00e9"         # same thing in unicode escape
>>> s.decode('cp437') == u   # use encoding that match your console
True
>>>

wy




> This is probably stupid and/or misguided but supposing I'm passed a  
> byte-string value that I want to be unicode, this is what I do. I'm sure  
> I'm missing something very important.
>
> Short version :
>
>>>> s = "José" #Start with non-unicode string
>>>> unicoded = eval("u'%s'" % "José")
>
> Long version :
>
>>>> s = "José" #Start with non-unicode string
>>>> s          #Lets look at it
> 'Jos\xe9'
>>>> escaped = s.encode('string_escape')
>>>> escaped
> 'Jos\\xe9'
>>>> unicoded = eval("u'%s'" % escaped)
>>>> unicoded
> u'Jos\xe9'
>
>>>> test = u"José"   #What they should have passed me
>>>> test == unicoded #Am I really getting the same thing?
> True                 #Yay!
>
>
>
>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to