2010/9/29 Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand>:
> In message <mailman.1132.1285714474.29448.python-l...@python.org>, Brendan
> Miller wrote:
>
>> It seems that characters not in the ascii subset of UTF-8 are
>> discarded by c_char_p during the conversion ...
>
> Not a chance.
>
>> ... or at least they don't print out when I go to print the string.
>
> So it seems there’s a problem on the printing side. What happens when you
> construct a UTF-8-encoded string directly in Python and try printing it the
> same way?

Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...

if I enter:
str = "日本語のテスト"

Then:
print str
日本語のテスト

However, when I create a string buffer, pass it into my c++ code, and
write the same UTF-8 string into it, python seems to discard pretty
much all the text. The same code works for pure ascii strings.

Python code:
_std_string_size = _lib_mbxclient.std_string_size
_std_string_size.restype = c_long
_std_string_size.argtypes = [c_void_p]

_std_string_copy = _lib_mbxclient.std_string_copy
_std_string_copy.restype = None
_std_string_copy.argtypes = [c_void_p, POINTER(c_char)]

# This function works for ascii, but breaks on strings with UTF-8!
def std_string_to_string(str_ptr):
    buf = create_string_buffer(_std_string_size(str_ptr))
    _std_string_copy(str_ptr, buf)
    return buf.raw

C++ code:

extern "C"
long std_string_size(string* str)
{
        return str->size();
}

extern "C"
void std_string_copy(string* str, char* buf)
{
        std::copy(str->begin(), str->end(), buf);
}
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to