STINNER Victor <victor.stin...@haypocalc.com> added the comment:

Support of characters outside the Unicode BMP (code > 0xffff) is not complete 
in narrow build (sizeof(Py_UNICODE) == 2) for Python2:

$ ./python
Python 2.7b2+ (trunk:81139M, May 13 2010, 18:45:37) 
>>> x=u'\U00010000'
>>> x[0], x[1]
(u'\ud800', u'\udc00')
>>> len(x)
2
>>> ord(x)
Traceback (most recent call last):
  ...
TypeError: ord() expected a character, but string of length 2 found
>>> unichr(0x10000)
Traceback (most recent call last):
  ...
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

It looks better in Python3:

$ ./python 
Python 3.2a0 (py3k:81137:81138, May 13 2010, 18:50:51) 
>>> x='\U00010000'
>>> x[0], x[1]
('\ud800', '\udc00')
>>> len(x)
2
>>> ord(x)
65536
>>> chr(0x10000)
'\U00010000'

About the issue, the problem is in function u_set(). This function should use 
PyUnicode_AsWideChar() but PyUnicode_AsWideChar() doesn't support surrogates... 
whereas PyUnicode_FromWideChar() does support surrogates.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8670>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to