Re: unicode wrap unicode object?

2006-04-08 Thread Fredrik Lundh
ygao [EMAIL PROTECTED] wrote:

  import sys
  sys.setdefaultencoding(utf-8)

hmm.  what kind of bootleg python is that ?

 import sys
 sys.setdefaultencoding(utf-8)
Traceback (most recent call last):
  File stdin, line 1, in ?
AttributeError: 'module' object has no attribute 'setdefaultencoding'

(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).

  s='\xe9\xab\x98' #this uff-8 string
  ss=U'\xe9\xab\x98'
  s
 '\xe9\xab\x98'
  ss
 u'\xe9\xab\x98'
 
 how do I get ss from s?
 Can there be a way do this?

you have UTF-8 *bytes* in a Unicode text string?  sounds like
someone's made a mistake earlier on...

anyway, iso-8859-1 is, in practice, a null transform, that simply
converts unicode characters to bytes:

 s = ss.encode(iso-8859-1)
 s
'\xe9\xab\x98'
 s.decode(utf-8)
u'\u9ad8'
 import unicodedata
 unicodedata.name(s.decode(utf-8))
'CJK UNIFIED IDEOGRAPH-9AD8'

but it's probably better to fix the code that puts UTF-8 data in your
Unicode strings (look for bogus iso-8859-1 conversions)

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode wrap unicode object?

2006-04-08 Thread ygao
sorry,my poor english.
I got a solution  from others.
I must use utf-8 for chinese.


 import sys
 reload(sys)
 sys.setdefaultencoding(utf-8)
 s='\xe9\xab\x98' #this uff-8 string
 ss=U'\xe9\xab\x98'
 ss1=ss.encode('unicode_escape').decode('string_escape')
 s1=s.decode('unicode_escape')
 s1==ss
True
 ss1==s
True


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode wrap unicode object?

2006-04-08 Thread ygao
sorry,my poor english.
I got a solution  from others.
I must use utf-8 for chinese.
 import sys
 reload(sys)
 sys.setdefaultencoding(utf-8)
 s='\xe9\xab\x98' #this uff-8 string
 ss=U'\xe9\xab\x98'
 ss1=ss.encode('unicode_escape').decode('string_escape')
 s1=s.decode('unicode_escape')
 s1==ss 
True 
 ss1==s 
True

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode wrap unicode object?

2006-04-08 Thread Fredrik Lundh
ygao wrpte_

 I must use utf-8 for chinese.

yeah, but you shouldn't store it in a *Unicode* string.  Unicode strings
are designed to hold things that you've already decoded (that is, your
chinese text), not the raw UTF-8 bytes.

if you store the UTF-8 in an ordinary 8-bit string instead, you can use
the unicode constructor to convert things properly:

b = ... some utf-8 data ...

# turn it into a unicode string
u = unicode(b, utf-8)

# ... do something with it ...

# turn it back into a utf-8 string
s = u.encode(utf-8)

# or use some other encoding
s = u.encode(big5)

e.g.

 b = '\xe9\xab\x98'
 u = unicode(b, utf-8)
 u.encode(utf-8)
'\xe9\xab\x98'
 u.encode(big5)
'\xb0\xaa'

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode wrap unicode object?

2006-04-08 Thread ygao
thanks for your advice.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode wrap unicode object?

2006-04-08 Thread Martin v. Löwis
ygao wrote:
 I must use utf-8 for chinese.

Sure. But please don't do that:

 import sys
 reload(sys)
 sys.setdefaultencoding(utf-8)

As Fredrik says, you should really avoid changing the
default encoding.

 s='\xe9\xab\x98' #this uff-8 string
 ss=U'\xe9\xab\x98'
 ss1=ss.encode('unicode_escape').decode('string_escape')
 s1=s.decode('unicode_escape')
 s1==ss 
 True 
 ss1==s 
 True

Ok. But how about that:

py s='\xe9\xab\x98'
py ss=u'\u9ad8'
py s1=s.decode('utf-8')
py s1==ss
True

Here, ss is a single character, which uses 3 bytes in UTF-8.
In your example, ss has three characters, which are not Chinese,
but European.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list