Unicode equality from raw_input
Hi, when getting text via the raw_input method it's always a string (even if it contains non-ASCII characters). The problem lies in that whenever I try to check equality against a Unicode string it fails. I've tried using the unicode method to 'cast' the string to the Unicode type but this throws an exception: a = raw_input(text: ) text: おはよう b = uおはよう a == b __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False type(a) type 'str' type(b) type 'unicode' unicode(a) Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128) str(b) Traceback (most recent call last): File stdin, line 1, in module UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) After a couple hours of hair pulling I think it's about time to admit defeat. Any help would be appreciated! -Damian -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode equality from raw_input
In order to convert a byte sequence to Unicode, Python needs to know the encoding being used. When you don't specify a encoding, it tries ASCII, which obviously errors if your byte sequence isn't ASCII, like in your case. Figure out what encoding your terminal/system is set to, then use the .decode() method to change the bytes to a unicode object. E.g.: bytestring = raw_input(text: ) as_unicode = bytestring.decode('utf8') #assuming the encoding is UTF-8 print as_unicode == uおはよう #== True Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com 2008/10/11 Damian Johnson [EMAIL PROTECTED]: Hi, when getting text via the raw_input method it's always a string (even if it contains non-ASCII characters). The problem lies in that whenever I try to check equality against a Unicode string it fails. I've tried using the unicode method to 'cast' the string to the Unicode type but this throws an exception: a = raw_input(text: ) text: おはよう b = uおはよう a == b __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False type(a) type 'str' type(b) type 'unicode' unicode(a) Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128) str(b) Traceback (most recent call last): File stdin, line 1, in module UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) After a couple hours of hair pulling I think it's about time to admit defeat. Any help would be appreciated! -Damian -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode equality from raw_input
2008/10/11 Damian Johnson [EMAIL PROTECTED] Hi, when getting text via the raw_input method it's always a string (even if it contains non-ASCII characters). The problem lies in that whenever I try to check equality against a Unicode string it fails. I've tried using the unicode method to 'cast' the string to the Unicode type but this throws an exception: Python needs to know the encoding of the bytestring in order to convert it to unicode. If you don't specify an encoding, ascii is assumed, which doesn't work for any bytestrings that actually contain non-ASCII data. Since you are reading the string from standard input, try using the encoding associated with stdin: a = raw_input(text: ) text: おはよう b = uおはよう import sys unicode(a,sys.stdin.encoding) == b True Karen -- http://mail.python.org/mailman/listinfo/python-list