> I see you are using Python 2 correct > Firstly, in Python 2, the compiler assumes that the source code is encoded in > ASCII gar, i must have been looking at doc for v3, as i thought it was all assumed to be utf8
> # -*- coding: utf-8 -*- okay, did that, still no change > you need to use u" ... " delimiters for Unicode, otherwise the results you > get are completely arbitrary and depend on the encoding of your terminal. okay, well, i'm on a mac, and not using "terminal" at all. but if i were, it would be utf8 but it's still not flying :( > For example, if I set my terminal encoding to IBM-850 okay how do you even do that? this is not an interactive session, this is embedded python, within a C++ app, so there's no terminal. but that is a good question: all the docs say "default encoding" everywhere (as in "If string is a Unicode object, this function computes the default encoding of string and operates on that"), but fail to specify just HOW i can set the default encoding. if i could just say "hey, default encoding is utf8", i think i'd be done? > So change the line of code to: > print u"frøânçïé" okay, sure... but i get the exact same results > Those two changes ought to fix the problem, but if they don't, try setting > your terminal encoding to UTF-8 as well well, i'm not sure what you mean by that. i don't have a terminal here. i'm logging to a utf8 log file (when i print) > but what it *actually* prints is this: > > print "frøânçïé" > --> fr√∏√¢n√ß√Ø√© >It's hard to say what *exactly* is happening here, because you don't explain >how the python print statement somehow gets into your C++ Log code. Do I guess >right that it catches stdout? yes, i'm redirecting stdout to my own custom print class, and then from that function i call into my embedded C++ print function >If so, then what I expect is happening is that Python has read in the source >code of >print "~~~~~" >with ~~~~~ as a bunch of junk bytes, and then your terminal is displaying >those junk bytes according to whatever encoding it happens to be using. >Since you are seeing this: >fr√∏√¢n√ß√Ø√© >my guess is that you're using a Mac, and the encoding is set to the MacRoman >encoding. Am I close? you hit the nail on the head there, i think. using that as a hint, i took this text "fr√∏√¢n√ß√Ø√©" and pasted that into a "macRoman" document, then *reinterpreted* it as UTF8, and voala: "frøânçïé" so, it seems that i AM getting my utf8 bytes, but i'm getting them converted to macRoman. huh? where is macRoman specified, and how to i change that to utf8? i think that's the missing golden ticket -- http://mail.python.org/mailman/listinfo/python-list