Re: utf-8 and ctypes

Diez B. Roggisch Thu, 30 Sep 2010 02:09:28 -0700

Brendan Miller <[email protected]> writes:

> 2010/9/29 Lawrence D'Oliveiro <[email protected]_zealand>:
>> In message <[email protected]>, Brendan
>> Miller wrote:
>>
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>>
>> Not a chance.
>>
>>> ... or at least they don't print out when I go to print the string.
>>
>> So it seems there’s a problem on the printing side. What happens when you
>> construct a UTF-8-encoded string directly in Python and try printing it the
>> same way?
>
> Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...
>
> if I enter:
> str = "日本語のテスト"


What is this? Which encoding is used by your editor to produce this
byte-string?

If you want to be sure you have the right encoding, you need to do this:

 - put a coding: utf-8 (or actually whatever your editor uses) in the
   first or second line
 - use unicode literals. That are the funny little strings with a "u" in
   front of them. They will be *decoded* using the declared encoding.
 - when passing this to C, explicitly *encode* with utf-8 first.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: utf-8 and ctypes

Reply via email to