[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

r85172 changes PyUnicode_AsWideCharString() (don't count the trailing nul 
character in the output size) and add unit tests.

r85173 patches unicode_aswidechar() to supports non-BMP characters for all 
known wchar_t/Py_UNICODE size combinaisons (2/2, 2/4 and 4/2).

I noticed that PyUnicode_AsWideChar() and PyUnicode_AsWideCharString() accept 
embeded nul characters. I don't know if it is a bug or an expected behaviour. 
Anyway, there is now a test for this case.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

r85174+r85177: ctypes.c_wchar supports non-BMP characters with 32 bits wchar_t 
= fix this issue

(I commited also an unwanted change on _testcapi to fix r85172 in r85174: 
r85175 reverts this change, and r85176 fixes the _testcapi bug again)

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 r85173 patches unicode_aswidechar() to supports non-BMP characters 
 for all known wchar_t/Py_UNICODE size combinaisons (2/2, 2/4 and 4/2).

Oh, and 4/4 ;-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-02 Thread Daniel Stutzbach

Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:

Thanks for working on this!

Since this was a bugfix, it should be merged back into 2.7, yes?

--
stage: unit test needed - committed/rejected

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Since this was a bugfix, it should be merged back into 2.7, yes?

Mmmh, the fix requires to change PyUnicode_AsWideChar() function (support 
non-BMP characters and surrogate pairs) (and maybe also to create 
PyUnicode_AsWideCharString()). I don't really want to change such important 
function in a stable branch (Python2).

Is it really important to support non-BMP characters for ctypes.c_wchar in 
Python2? I would like to say: if you want better unicode support, use Python 3. 
And Python 3.2 if it's possible :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-02 Thread Daniel Stutzbach

Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:

Since I noticed the bug through source code inspection and no one has reported 
it occurring in practice, that sounds reasonable to me.

--
versions:  -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Update the patch for the new PyUnicode_AsWideCharString() function:
 - use Py_UNICODE_SIZE and SIZEOF_WCHAR_T in the preprocessor tests
 - faster loop: don't use a counter + pointer, but only use pointers (for the 
stop condition)

The patch is not finished: I have to implement #elif Py_UNICODE_SIZE == 4  
SIZEOF_WCHAR_T == 2 case.

--
Added file: http://bugs.python.org/file19082/aswidechar_nonbmp-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: 
http://bugs.python.org/file17322/pyunicode_aswidechar_surrogates-py3k.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Patch version 3:
 - fix unicode_aswidechar if Py_UNICODE_SIZE == SIZEOF_WCHAR_T and w == NULL 
(return the number of characters, don't write into w!)
 - improve unicode_aswidechar() comment

--
Added file: http://bugs.python.org/file19083/aswidechar_nonbmp-3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I don't know how to test if Py_UNICODE_SIZE == 4  SIZEOF_WCHAR_T == 2. On 
Windows, sizeof(wchar_t) is 2, but it looks like Python is not prepared to have 
Py_UNICODE != wchar_t for is Windows implementation.

wchar_t is 32 bits long on Linux and Mac OS X. So how can I test it? Or should 
we just drop support of Py_UNICODE_SIZE == 4  SIZEOF_WCHAR_T == 2?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread Daniel Stutzbach

Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:

I, too, can't think of any platforms where Py_UNICODE_SIZE == 4  
SIZEOF_WCHAR_T == 2 and I'm not sure what the previous policy has been.  Have 
you noticed any other code that would set a precedent?

If no one else chimes in, perhaps ask on IRC or python-dev?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 I don't know how to test if Py_UNICODE_SIZE == 4  SIZEOF_WCHAR_T == 2. On 
 Windows, sizeof(wchar_t) is 2, but it looks like Python is not prepared to 
 have Py_UNICODE != wchar_t for is Windows implementation.

 wchar_t is 32 bits long on Linux and Mac OS X. So how can I test it? Or 
 should we just drop support of Py_UNICODE_SIZE == 4  SIZEOF_WCHAR_T == 2?

You can tweak the Windows pyconfig.h to use UCS4, AFAIK, if you want to
test drive this case.

But it's probably easier to configure with gcc -fshort-wchar on
Linux :-)

Dropping support for this is not a good idea.

--
title: c_types.c_wchar should not assume that sizeof(wchar_t) == 
sizeof(Py_UNICODE) - c_types.c_wchar should not assume that sizeof(wchar_t) == 
   sizeof(Py_UNICODE)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread Daniel Stutzbach

Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:

 You can tweak the Windows pyconfig.h to use UCS4, AFAIK, if you want to
 test drive this case.

I seem to recall seeing some other code that assumed Windows implied UCS2.  
Proceed with caution. ;-)

 But it's probably easier to configure with gcc -fshort-wchar on
 Linux :-)

libc will still be using sizeof(wchar_t) == 4, though.  Won't that cause Bad 
Things to happen when calling libc wide-character functions?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Daniel Stutzbach wrote:
 
 Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
 
 You can tweak the Windows pyconfig.h to use UCS4, AFAIK, if you want to
 test drive this case.
 
 I seem to recall seeing some other code that assumed Windows implied UCS2.  
 Proceed with caution. ;-)

Probably, yes. I've never tried it myself.

 But it's probably easier to configure with gcc -fshort-wchar on
 Linux :-)
 
 libc will still be using sizeof(wchar_t) == 4, though.  Won't that cause Bad 
 Things to happen when calling libc wide-character functions?

Sure, but this is just about testing an interface, not running
applications :-)

Here's what the GCC man-page has to say:

   -fshort-wchar
   Override the underlying type for wchar_t to be short unsigned int 
instead
   of the default for the target.  This option is useful for building
   programs to run under WINE.

   Warning: the -fshort-wchar switch causes GCC to generate code that 
is not
   binary compatible with code generated without that switch.  Use it to
   conform to a non-default application binary interface.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Patch version 4:
 - implement unicode_aswidechar() for 16 bits wchar_t and 32 bits Py_UNICODE
 - PyUnicode_AsWideWcharString() returns the number of wide characters 
excluding the nul character as does PyUnicode_AsWideChar()

For 16 bits wchar_t and 32 bits Py_UNICODE, I extracted the as wide char 
unicode functions into a small C file and compiled it with -fshort-wchar on 
Linux.

--
Added file: http://bugs.python.org/file19099/aswidechar_nonbmp-4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file19082/aswidechar_nonbmp-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file19083/aswidechar_nonbmp-3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-10-01 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Ooops, I lost my patch to fix the initial (ctypes) issue. Here is an updated 
patch: ctypes_nonbmp.patch (which needs aswidechar_nonbmp-4.patch).

--
Added file: http://bugs.python.org/file19101/ctypes_nonbmp.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-09-28 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

#9979 proposes to create a new PyUnicode_AsWideCharString() function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-09-28 Thread Daniel Stutzbach

Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:

I know enough about Unicode to have reported this bug, but I don't feel 
knowledgeable enough about Python's Unicode implementation to comment on your 
suggested solution.

I'm adding the other people listed in Misc/maintainers.rst as interested in 
Unicode to the nosy list on this issue and the one you just linked to.

--
nosy: +ezio.melotti, lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-05-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Support of characters outside the Unicode BMP (code  0x) is not complete 
in narrow build (sizeof(Py_UNICODE) == 2) for Python2:

$ ./python
Python 2.7b2+ (trunk:81139M, May 13 2010, 18:45:37) 
 x=u'\U0001'
 x[0], x[1]
(u'\ud800', u'\udc00')
 len(x)
2
 ord(x)
Traceback (most recent call last):
  ...
TypeError: ord() expected a character, but string of length 2 found
 unichr(0x1)
Traceback (most recent call last):
  ...
ValueError: unichr() arg not in range(0x1) (narrow Python build)

It looks better in Python3:

$ ./python 
Python 3.2a0 (py3k:81137:81138, May 13 2010, 18:50:51) 
 x='\U0001'
 x[0], x[1]
('\ud800', '\udc00')
 len(x)
2
 ord(x)
65536
 chr(0x1)
'\U0001'

About the issue, the problem is in function u_set(). This function should use 
PyUnicode_AsWideChar() but PyUnicode_AsWideChar() doesn't support surrogates... 
whereas PyUnicode_FromWideChar() does support surrogates.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-05-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Patch for Python3:
 - Fix PyUnicode_AsWideChar() to support surrogates (Py_UNICODE: 2 bytes, 
wchar_t: 4 bytes)
 - u_set() of _ctypes uses PyUnicode_AsWideChar()
 - add a test (skipped if sizeof(wchar_t) is smaller than 4 bytes)

It's too late to fix Python2: 2.7 beta 2 was released (it doesn't support non 
BMP characters for chr()/ord()).

TODO:
 - I'm not sure that my patch on PyUnicode_AsWideChar() works if 
sizeof(wchar_t)==2
 - Test the patch on Windows
 - Check that it doesn't break other functions calling PyUnicode_AsWideChar()

ctypes, _locale.strcoll() and time.strftime() use PyUnicode_AsWideChar(). time 
has interesting comments:

/* This assumes that PyUnicode_AsWideChar doesn't do any UTF-16
   expansion. */
if (PyUnicode_AsWideChar((PyUnicodeObject*)format,
 (wchar_t*)PyBytes_AS_STRING(tmpfmt),
 PyUnicode_GetSize(format)+1) == (size_t)-1)
/* This shouldn't fail. */
Py_FatalError(PyUnicode_AsWideChar failed);

--
keywords: +patch
Added file: 
http://bugs.python.org/file17322/pyunicode_aswidechar_surrogates-py3k.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-05-12 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
components: +Unicode

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-05-12 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8670] c_types.c_wchar should not assume that sizeof(wchar_t) == sizeof(Py_UNICODE)

2010-05-09 Thread Daniel Stutzbach

New submission from Daniel Stutzbach dan...@stutzbachenterprises.com:

Using a UCS2 Python on a platform with a 32-bit wchar_t, the following code 
throws an exception (but should not):

 ctypes.c_wchar('\u1')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: one character unicode string expected

The trouble is in the u_set() function in Modules/_ctypes/cfield.c.  The 
corresponding u_get() function looks correct.

On a UCS4 Python running on a system with a 16-bit wchar_t, u_set() will 
corrupt the data by silently truncating the character to 16-bits.

For reference, Linux and Mac OS use a 32-bit wchar_t while Windows uses a 
16-bit wchar_t.

--
assignee: theller
components: ctypes
messages: 105374
nosy: stutzbach, theller
priority: normal
severity: normal
stage: unit test needed
status: open
title: c_types.c_wchar should not assume that sizeof(wchar_t) == 
sizeof(Py_UNICODE)
type: behavior
versions: Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com