Brendan Miller <catph...@catphive.net> writes: > 2010/9/29 Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand>: >> In message <mailman.1132.1285714474.29448.python-l...@python.org>, Brendan >> Miller wrote: >> >>> It seems that characters not in the ascii subset of UTF-8 are >>> discarded by c_char_p during the conversion ... >> >> Not a chance. >> >>> ... or at least they don't print out when I go to print the string. >> >> So it seems there’s a problem on the printing side. What happens when you >> construct a UTF-8-encoded string directly in Python and try printing it the >> same way? > > Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8... > > if I enter: > str = "日本語のテスト"
What is this? Which encoding is used by your editor to produce this byte-string? If you want to be sure you have the right encoding, you need to do this: - put a coding: utf-8 (or actually whatever your editor uses) in the first or second line - use unicode literals. That are the funny little strings with a "u" in front of them. They will be *decoded* using the declared encoding. - when passing this to C, explicitly *encode* with utf-8 first. Diez -- http://mail.python.org/mailman/listinfo/python-list