On Mon, Mar 07, 2005 at 11:56:57PM +0100, Francis Girard wrote: > BTW, the python "unicode" built-in function documentation says it returns a > "unicode" string which scarcely means something. What is the python > "internal" unicode encoding ?
The language reference says farily little about unicode objects. Here's what it does say: [http://docs.python.org/ref/types.html#l2h-48] Unicode The items of a Unicode object are Unicode code units. A Unicode code unit is represented by a Unicode object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and will be reported as two separate items. The built-in functions unichr() and ord() convert between code units and nonnegative integers representing the Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to other encodings are possible through the Unicode method encode and the built-in function unicode(). In terms of the CPython implementation, the PyUnicodeObject is laid out as follows: typedef struct { PyObject_HEAD int length; /* Length of raw Unicode data in buffer */ Py_UNICODE *str; /* Raw Unicode buffer */ long hash; /* Hash value; -1 if not set */ PyObject *defenc; /* (Default) Encoded version as Python string, or NULL; this is used for implementing the buffer protocol */ } PyUnicodeObject; Py_UNICODE is some "C" integral type that can hold values up to sys.maxunicode (probably one of unsigned short, unsigned int, unsigned long, wchar_t). Jeff
pgpP4skAuElRz.pgp
Description: PGP signature
-- http://mail.python.org/mailman/listinfo/python-list