2010/9/29 Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand>: > In message <mailman.1132.1285714474.29448.python-l...@python.org>, Brendan > Miller wrote: > >> It seems that characters not in the ascii subset of UTF-8 are >> discarded by c_char_p during the conversion ... > > Not a chance. > >> ... or at least they don't print out when I go to print the string. > > So it seems there’s a problem on the printing side. What happens when you > construct a UTF-8-encoded string directly in Python and try printing it the > same way?
Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8... if I enter: str = "日本語のテスト" Then: print str 日本語のテスト However, when I create a string buffer, pass it into my c++ code, and write the same UTF-8 string into it, python seems to discard pretty much all the text. The same code works for pure ascii strings. Python code: _std_string_size = _lib_mbxclient.std_string_size _std_string_size.restype = c_long _std_string_size.argtypes = [c_void_p] _std_string_copy = _lib_mbxclient.std_string_copy _std_string_copy.restype = None _std_string_copy.argtypes = [c_void_p, POINTER(c_char)] # This function works for ascii, but breaks on strings with UTF-8! def std_string_to_string(str_ptr): buf = create_string_buffer(_std_string_size(str_ptr)) _std_string_copy(str_ptr, buf) return buf.raw C++ code: extern "C" long std_string_size(string* str) { return str->size(); } extern "C" void std_string_copy(string* str, char* buf) { std::copy(str->begin(), str->end(), buf); } -- http://mail.python.org/mailman/listinfo/python-list