On Sat, Aug 24, 2013 at 9:47 AM, David M. Cotter <m...@davecotter.com> wrote: > > > What _are_ you using? > i have scripts in a file, that i am invoking into my embedded python within a > C++ program. there is no terminal involved. the "print" statement has been > redirected (via sys.stdout) to my custom print class, which does not specify > "encoding", so i tried the suggestion above to set it: > > static const char *s_RedirectScript = > "import " kEmbeddedModuleName "\n" > "import sys\n" > "\n" > "class CustomPrintClass:\n" > " def write(self, stuff):\n" > " " kEmbeddedModuleName "." kCustomPrint "(stuff)\n" > "class CustomErrClass:\n" > " def write(self, stuff):\n" > " " kEmbeddedModuleName "." kCustomErr "(stuff)\n" > "sys.stdout = CustomPrintClass()\n" > "sys.stderr = CustomErrClass()\n" > "sys.stdout.encoding = 'UTF-8'\n" > "sys.stderr.encoding = 'UTF-8'\n"; > > > but it didn't help. > > I'm still getting back a string that is a utf-8 string of characters that, if > converted to "macRoman" and then interpreted as UTF8, shows the original, > correct string. who is specifying macRoman, and where, and how do i tell > whoever that is that i really *really* want utf8? > --
If you're running this from a C++ program, then you aren't getting back characters. You're getting back bytes. If you treat them as UTF-8, they'll work properly. The only thing wrong is the text editor you're using to open the file afterwards- since you aren't specifying an encoding, it's assuming MacRoman. You can try putting the UTF-8 BOM (it's not really a BOM) at the front of the file- the bytes 0xEF 0xBB 0xBF are used by some editors to identify a file as UTF-8. -- http://mail.python.org/mailman/listinfo/python-list