On 8/25/2013 1:57 PM, David M. Cotter wrote:
i'm sorry this is so confusing, let me try to re-state the problem in as clear
a way as i can.
I have a C++ program, with very well tested unicode support. All logging is
done in utf8. I have conversion routines that work flawlessly, so i can assure
you there is nothing wrong with logging and unicode support in the underlying
program.
I am embedding python 2.7 into the program, and extending python with routines
in my C++ program.
If you want 'well-tested' (correct) unicode support from Python, use
3.3. Unicode in 2.x is somewhat buggy and definitely flakey. The first
fix was to make unicode *the* text type, in 3.0. The second was to
redesign the internals in 3.3. It is possible that 2.7 is too broken for
what you want to do.
I have a script, encoded in utf8, and *marked* as utf8 with this line:
# -*- coding: utf-8 -*-
In that script, i have inline unicode text.
The example scripts that you posted pictures of do *not* have unicode
text. They have bytestring literals with (encoded) non-ascii chars
inside them. This is not a great idea. I am not sure what bytes you end
up with. Apparently, not what you expect.
To make them 'unicode text', you must prepend the literals with 'u'.
Didn't someone say this before?
When I pass that text to my C++ program, the Python interpreter decides that these bytes are macRoman, and handily
"converts" them to unicode. To compensate, i must "convert" these "macRoman" characters
encoded as utf8, back to macRoman, then "interpret" them as utf8. In this way i can recover the original
unicode.
When i return a unicode string back to python, i must do the reverse so that
Python gets back what it expects.
This is not related to printing, or sys.stdout, it does happen with that too
but focusing on that is a red-herring. Let's focus on just passing a string
into C++ then back out.
This would all actually make sense IF my script was marked as being "macRoman"
even tho i entered UTF8 Characters, but that is not the case.
Let's prove my statements. Here is the script, *interpreted* as MacRoman:
http://karaoke.kjams.com/screenshots/bugs/python_unicode/script_as_macroman.png
Why are you posting pictures of code, instead of the (runnable) code
itself, as you did with C code?
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list