Re: unicode wrap unicode object?
ygao [EMAIL PROTECTED] wrote: import sys sys.setdefaultencoding(utf-8) hmm. what kind of bootleg python is that ? import sys sys.setdefaultencoding(utf-8) Traceback (most recent call last): File stdin, line 1, in ? AttributeError: 'module' object has no attribute 'setdefaultencoding' (you're not supposed to change the default encoding. don't do that; it'll only cause problems in the long run). s='\xe9\xab\x98' #this uff-8 string ss=U'\xe9\xab\x98' s '\xe9\xab\x98' ss u'\xe9\xab\x98' how do I get ss from s? Can there be a way do this? you have UTF-8 *bytes* in a Unicode text string? sounds like someone's made a mistake earlier on... anyway, iso-8859-1 is, in practice, a null transform, that simply converts unicode characters to bytes: s = ss.encode(iso-8859-1) s '\xe9\xab\x98' s.decode(utf-8) u'\u9ad8' import unicodedata unicodedata.name(s.decode(utf-8)) 'CJK UNIFIED IDEOGRAPH-9AD8' but it's probably better to fix the code that puts UTF-8 data in your Unicode strings (look for bogus iso-8859-1 conversions) /F -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode wrap unicode object?
sorry,my poor english. I got a solution from others. I must use utf-8 for chinese. import sys reload(sys) sys.setdefaultencoding(utf-8) s='\xe9\xab\x98' #this uff-8 string ss=U'\xe9\xab\x98' ss1=ss.encode('unicode_escape').decode('string_escape') s1=s.decode('unicode_escape') s1==ss True ss1==s True -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode wrap unicode object?
sorry,my poor english. I got a solution from others. I must use utf-8 for chinese. import sys reload(sys) sys.setdefaultencoding(utf-8) s='\xe9\xab\x98' #this uff-8 string ss=U'\xe9\xab\x98' ss1=ss.encode('unicode_escape').decode('string_escape') s1=s.decode('unicode_escape') s1==ss True ss1==s True -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode wrap unicode object?
ygao wrpte_ I must use utf-8 for chinese. yeah, but you shouldn't store it in a *Unicode* string. Unicode strings are designed to hold things that you've already decoded (that is, your chinese text), not the raw UTF-8 bytes. if you store the UTF-8 in an ordinary 8-bit string instead, you can use the unicode constructor to convert things properly: b = ... some utf-8 data ... # turn it into a unicode string u = unicode(b, utf-8) # ... do something with it ... # turn it back into a utf-8 string s = u.encode(utf-8) # or use some other encoding s = u.encode(big5) e.g. b = '\xe9\xab\x98' u = unicode(b, utf-8) u.encode(utf-8) '\xe9\xab\x98' u.encode(big5) '\xb0\xaa' /F -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode wrap unicode object?
thanks for your advice. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode wrap unicode object?
ygao wrote: I must use utf-8 for chinese. Sure. But please don't do that: import sys reload(sys) sys.setdefaultencoding(utf-8) As Fredrik says, you should really avoid changing the default encoding. s='\xe9\xab\x98' #this uff-8 string ss=U'\xe9\xab\x98' ss1=ss.encode('unicode_escape').decode('string_escape') s1=s.decode('unicode_escape') s1==ss True ss1==s True Ok. But how about that: py s='\xe9\xab\x98' py ss=u'\u9ad8' py s1=s.decode('utf-8') py s1==ss True Here, ss is a single character, which uses 3 bytes in UTF-8. In your example, ss has three characters, which are not Chinese, but European. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list