On 8/25/2013 1:57 PM, David M. Cotter wrote:
i'm sorry this is so confusing, let me try to re-state the problem in as clear 
a way as i can.

I have a C++ program, with very well tested unicode support.  All logging is 
done in utf8.  I have conversion routines that work flawlessly, so i can assure 
you there is nothing wrong with logging and unicode support in the underlying 
program.

I am embedding python 2.7 into the program, and extending python with routines 
in my C++ program.

If you want 'well-tested' (correct) unicode support from Python, use 3.3. Unicode in 2.x is somewhat buggy and definitely flakey. The first fix was to make unicode *the* text type, in 3.0. The second was to redesign the internals in 3.3. It is possible that 2.7 is too broken for what you want to do.

I have a script, encoded in utf8, and *marked* as utf8 with this line:
     # -*- coding: utf-8 -*-

In that script, i have inline unicode text.

The example scripts that you posted pictures of do *not* have unicode text. They have bytestring literals with (encoded) non-ascii chars inside them. This is not a great idea. I am not sure what bytes you end up with. Apparently, not what you expect.

To make them 'unicode text', you must prepend the literals with 'u'. Didn't someone say this before?

When I pass that text to my C++ program, the Python interpreter decides that these bytes are macRoman, and handily 
"converts" them to unicode.  To compensate, i must "convert" these "macRoman" characters 
encoded as utf8, back to macRoman, then "interpret" them as utf8.  In this way i can recover the original 
unicode.

When i return a unicode string back to python, i must do the reverse so that 
Python gets back what it expects.

This is not related to printing, or sys.stdout, it does happen with that too 
but focusing on that is a red-herring.  Let's focus on just passing a string 
into C++ then back out.

This would all actually make sense IF my script was marked as being "macRoman" 
even tho i entered UTF8 Characters, but that is not the case.

Let's prove my statements.  Here is the script, *interpreted* as MacRoman:
http://karaoke.kjams.com/screenshots/bugs/python_unicode/script_as_macroman.png

Why are you posting pictures of code, instead of the (runnable) code itself, as you did with C code?

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to