Unicode equality from raw_input

2008-10-11 Thread Damian Johnson
Hi, when getting text via the raw_input method it's always a string (even if
it contains non-ASCII characters). The problem lies in that whenever I try
to check equality against a Unicode string it fails. I've tried using the
unicode method to 'cast' the string to the Unicode type but this throws an
exception:

 a = raw_input(text: )
text: おはよう
 b = uおはよう
 a == b
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both
arguments to Unicode - interpreting them as being unequal
False
 type(a)
type 'str'
 type(b)
type 'unicode'
 unicode(a)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0:
ordinal not in range(128)
 str(b)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
ordinal not in range(128)


After a couple hours of hair pulling I think it's about time to admit
defeat. Any help would be appreciated! -Damian
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode equality from raw_input

2008-10-11 Thread Chris Rebert
In order to convert a byte sequence to Unicode, Python needs to know
the encoding being used. When you don't specify a encoding, it tries
ASCII, which obviously errors if your byte sequence isn't ASCII, like
in your case.

Figure out what encoding your terminal/system is set to, then use the
.decode() method to change the bytes to a unicode object. E.g.:

bytestring = raw_input(text: )
as_unicode = bytestring.decode('utf8') #assuming the encoding is UTF-8
print as_unicode == uおはよう #== True

Cheers,
Chris
-- 
Follow the path of the Iguana...
http://rebertia.com


2008/10/11 Damian Johnson [EMAIL PROTECTED]:
 Hi, when getting text via the raw_input method it's always a string (even if
 it contains non-ASCII characters). The problem lies in that whenever I try
 to check equality against a Unicode string it fails. I've tried using the
 unicode method to 'cast' the string to the Unicode type but this throws an
 exception:

 a = raw_input(text: )
 text: おはよう
 b = uおはよう
 a == b
 __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both
 arguments to Unicode - interpreting them as being unequal
 False
 type(a)
 type 'str'
 type(b)
 type 'unicode'
 unicode(a)
 Traceback (most recent call last):
   File stdin, line 1, in module
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0:
 ordinal not in range(128)
 str(b)
 Traceback (most recent call last):
   File stdin, line 1, in module
 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
 ordinal not in range(128)


 After a couple hours of hair pulling I think it's about time to admit
 defeat. Any help would be appreciated! -Damian


 --
 http://mail.python.org/mailman/listinfo/python-list


--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode equality from raw_input

2008-10-11 Thread Karen Tracey
2008/10/11 Damian Johnson [EMAIL PROTECTED]

 Hi, when getting text via the raw_input method it's always a string (even
 if it contains non-ASCII characters). The problem lies in that whenever I
 try to check equality against a Unicode string it fails. I've tried using
 the unicode method to 'cast' the string to the Unicode type but this throws
 an exception:


Python needs to know the encoding of the bytestring in order to convert it
to unicode.  If you don't specify an encoding, ascii is assumed, which
doesn't work for any bytestrings that actually contain non-ASCII data.
Since you are reading the string from standard input, try using the encoding
associated with stdin:

 a = raw_input(text: )
text: おはよう
 b = uおはよう
 import sys
 unicode(a,sys.stdin.encoding) == b
True

Karen
--
http://mail.python.org/mailman/listinfo/python-list