Gerhard Fiedler wrote: > Well, ASCII can represent the Unicode numerically -- if that is what the OP > wants.
No. ASCII characters range is 0..127 while Unicode characters range is at least 0..65535. > For example, "U+81EC" (all ASCII) is one possible -- not very > readable though <g> -- representation of a Hanzi character (see > http://www.cojak.org/index.php?function=code_lookup&term=81EC). U+81EC means a Unicode character which is represented by the number 0x81EC. There are some encodings defined which map Unicode sequences to byte sequences: UTF-8 maps Unicode strings to sequences of bytes in the range 0..255, UTF-7 maps Unicode strings to sequences of bytes in the range 0..127. You *could* read the latter as ASCII sequences but this is not correct. How to do it in Python? Let chinesePhrase be a Unicode string with Chinese content. Then chinesePhrase_7bit = chinesePhrase.encode('utf-7') will produce a sequences of bytes in the range 0..127 representing chinesePhrase and *looking like* a (meaningless) ASCII sequence. chinesePhrase_16bit = chinesePhrase.encode('utf-16be') will produce a sequence with Unicode numbers packed in a byte string in big endian order. This is probably closest to what the OP wants. Peter Maas, Aachen -- http://mail.python.org/mailman/listinfo/python-list