On Fri, 23 Aug 2013 13:49:23 -0700, David M. Cotter wrote: > note everything works great if i use Ascii, but: > > in my utf8-encoded script i have this: > >> print "frøânçïé"
I see you are using Python 2, in which case there are probably two or three errors being made here. Firstly, in Python 2, the compiler assumes that the source code is encoded in ASCII, actually ASCII plus arbitrary bytes. Since your source code is *actually* UTF-8, the bytes in the file are: 70 72 69 6E 74 20 22 66 72 C3 B8 C3 A2 6E C3 A7 C3 AF C3 A9 22 But Python doesn't know the file is encoded in UTF-8, it thinks it is reading ASCII plus junk, so when it reads the file it parses those bytes into a line of code: print "~~~~~" where the ~~~~~ represents a bunch of 13 rubbish junk bytes. So that's the first problem to fix. You can fix this by adding an encoding cookie at the beginning of your module, in the first or second line: # -*- coding: utf-8 -*- The second problem is that even once you've fixed the source encoding, you're still not dealing with a proper Unicode string. In Python 2, you need to use u" ... " delimiters for Unicode, otherwise the results you get are completely arbitrary and depend on the encoding of your terminal. For example, if I set my terminal encoding to IBM-850, I get: fr°Ônþ´Ú from those bytes. If I set it to Central European ISO-8859-3 I get this: frĝânçïé Clearly not what I intended. So change the line of code to: print u"frøânçïé" Those two changes ought to fix the problem, but if they don't, try setting your terminal encoding to UTF-8 as well and see if that helps. [...] > but what it *actually* prints is this: > >> print "frøânçïé" > --> fr√∏√¢n√ß√Ø√© It's hard to say what *exactly* is happening here, because you don't explain how the python print statement somehow gets into your C++ Log code. Do I guess right that it catches stdout? If so, then what I expect is happening is that Python has read in the source code of print "~~~~~" with ~~~~~ as a bunch of junk bytes, and then your terminal is displaying those junk bytes according to whatever encoding it happens to be using. Since you are seeing this: fr√∏√¢n√ß√Ø√© my guess is that you're using a Mac, and the encoding is set to the MacRoman encoding. Am I close? To summarise: * Add an encoding cookie, to tell Python to use UTF-8 when parsing your source file. * Use a Unicode string u"frøânçïé". * Consider setting your terminal to use UTF-8, otherwise it may not be able to print all the characters you would like. * You may need to change the way data gets into your C++ Log function. If it expects bytes, you may need to use u"...".encode('utf-8') rather than just u"...". But since I don't understand how data is getting into your Log function, I can't be sure about this. I think that is everything. Does that fix your problem? -- Steven -- http://mail.python.org/mailman/listinfo/python-list