Re: [C++-sig] boost::python::str and Python's str and unicode types

Robert Smallshire Tue, 04 Aug 2009 01:38:47 -0700


On Tue, Jul 28, 2009 at 10:11 PM, Robert
Smallshire<robert.smallsh...@roxar.com> wrote:


> I have modified my local build of boost.python to include a
> boost::python::unicode class, together with appropriate conversions from
> wchar_t, const wchar_t* and std::wstring...

During testing we have encountered issues with the difference in size of 
wchar_t and Py_UNICODE.

Windows : sizeof(wchar_t) == sizeof(Py_UNICODE) == 2
Linux   : sizeof(wchar_t) == 4 != sizeof(Py_UNICODE) == 2

assuming a UCS-2 build of Python which is the default. If Python is built with 
UCS-4 support then I believe Py_UNICODE and wchar_t will become compatible on 
Linux, but I'm not sure what the implications are for compatibility of Unicode 
string pickles, for example, between UCS-2 and UCS-4 builds of Python.

Unfortunately, extract<const wchar_t*> seems to be problematic to implement in 
a portable manner because of these size differences.  I have identified the 
following options:

1) Don't support extract<const wchar_t*> at all. There are no portability 
problems, but we have reduced functionality and break the symmetry between 
boost::python::str and boost::python::unicode behaviour.

2) Only support extract<const wchar_t*> on platforms where sizeof(wchar_t) == 
sizeof(Py_UNICODE) where the PyUnicode_AsUnicode function can be used to return 
a pointer to Python's internal buffer.  This has the API usability advantage of 
being symmetrical with how extract<const char*> works in boost.python today on 
platforms that support it. However, this makes writing portable code for 
clients awkward. This is what my current implementation does, and its broken on 
Linux.

3) Implement extract<const wchar_t*> such that it always copies the data from 
the Py_UNICODE buffer into a new wchar_t buffer using PyUnicode_AsWideChar 
under the hood.  The caller is then responsible for managing the lifetime of 
the buffer using delete [] or boost::shared_array.  This is how the 
extract<std::wstring> is implemented which works without difficulty.  However, 
this breaks the symmetry with extract<const char*> is a non-obvious way that 
would need to be prominently documented.  I suggest this approach would be 
likely to lead to quite leaky usage of the API by unwary clients, especially 
when porting code to Unicode strings.

4) #ifdef between (2) and (3) above depending on whether sizeof(wchar_t) == 
sizeof(Py_UNICODE).  Combines all the bad characteristics of the above.

There may, of course, be other options.

If the data needs to be copied into a new buffer of wchar_t, the lifetime of 
which needs to be managed by the client, that pretty much describes the raison 
d'être of std::wstring, so my current preference is for option (1). If we did 
this, we'd still be able to construct boost::python::unicode instances from 
const wchar_t*, but would only be able to extract them as std::wstring.   I'm 
open to persuasion about the right way forward...

Thanks in advance for any comments or suggestions, and also to the people who 
have expressed interest in these patches off list.

Regards,

Rob Smallshire
Roxar Software Solutions



DISCLAIMER:
This message contains information that may be privileged or confidential and is 
the property of the Roxar Group. It is intended only for the person to whom it 
is addressed. If you are not the intended recipient, you are not authorised to 
read, print, retain, copy, disseminate, distribute, or use this message or any 
part thereof. If you receive this message in error, please notify the sender 
immediately and delete all copies of this message.
_______________________________________________
Cplusplus-sig mailing list
Cplusplus-sig@python.org
http://mail.python.org/mailman/listinfo/cplusplus-sig

Re: [C++-sig] boost::python::str and Python's str and unicode types

Reply via email to