On Thu, Sep 22, 2016 at 10:27 PM, Peter Otten <__pete...@web.de> wrote: > When the encoding used for the file and the encoding used by the terminal > differ the output of non-ascii characters gets messed up. Example script: > > # -*- coding: iso-8859-15 -*- > > print "first unicode:" > print u"Schöön" > > print "then bytes:" > print "Schöön" > > When I dump that in my UTF-8 terminal all "ö"s are lost because it gets the > invalid byte sequence b"\xf6" rather than the required b"\xc3\xb6": > > $ cat demo.py > # -*- coding: iso-8859-15 -*- > > print "first unicode:" > print u"Sch��n" > > print "then bytes:" > print "Sch��n" > > But when I run the code: > > $ python demo.py > first unicode: > Schöön > then bytes: > Sch��n
What this really means is that you (almost certainly) shouldn't be storing non-ASCII text in byte strings. Most stuff will "just work" if you're using a Unicode string (obviously cat doesn't acknowledge the coding cookie, but Python itself does, as do a number of editors), and of course, you can avoid all the u"..." prefixes by going to Py3. Trying to use text in byte strings is extremely encoding-dependent, and thus dangerous. Sure, it'll generally work for ASCII... but only because you're highly likely to have your terminal set to an ASCII-compatible encoding. If you pick something else.... you're in for a whole new world of fun. Acres of entertainment. ChrisA -- https://mail.python.org/mailman/listinfo/python-list