On Sun, 30 Aug 2009 02:36:49 +0000, Steven D'Aprano wrote: >>> So long as your terminal has a sensible encoding, and you have a good >>> quality font, you should be able to print any string you can create. >> >> UTF-8 isn't a particularly sensible encoding for terminals. > > Did I mention UTF-8? > > Out of curiosity, why do you say that UTF-8 isn't sensible for terminals?
I don't think I've ever seen a terminal (whether an emulator running on a PC or a hardware terminal) which supports anything like the entire Unicode repertoire, along with right-to-left writing, complex scripts, etc. Even support for double-width characters is uncommon. If your terminal can't handle anything outside of ISO-8859-1, there isn't any advantage to using UTF-8, and some disadvantages; e.g. a typical Unix tty driver will delete the last *byte* from the input buffer when you press backspace (Linux 2.6.* has the IUTF8 flag, but this is non-standard). Historically, terminal I/O has tended to revolve around unibyte encodings, with everything except the endpoints being encoding-agnostic. Anything which falls outside of that is a dog's breakfast; it's no coincidence that the word for "messed-up text" (arising from an encoding mismatch) was borrowed from Japanese (mojibake). Life is simpler if you can use a unibyte encoding. Apart from anything else, the failure modes tend to be harmless. E.g. you get the wrong glyph rather than two glyphs where you expected one. On a 7-bit channel, you get the wrong printable character rather than a control character (this is why ISO-8859-* reserves \x80-\x9F as control codes rather than using them as printable characters). >> And "Unicode font" is an oxymoron. You can merge a whole bunch of fonts >> together and stuff them into a TTF file; that doesn't make them "a >> font", though. > > I never mentioned "Unicode font" either. In any case, there's no reason > why a skillful designer can't make a single font which covers the entire > Unicode range in a consistent style. Consistency between unrelated scripts is neither realistic nor desirable. E.g. Latin fonts tend to use uniform stroke widths unless they're specifically designed to look like handwriting, whereas Han fonts tend to prefer variable-width strokes which reflect the direction. >> The main advantage of using Unicode internally is that you can associate >> encodings with the specific points where data needs to be converted >> to/from bytes, rather than having to carry the encoding details around >> the program. > > Surely the main advantage of Unicode is that it gives you a full and > consistent range of characters not limited to the 128 characters provided > by ASCII? Nothing stops you from using other encodings, or from using multiple encodings. But using multiple encodings means keeping track of the encodings. This isn't impossible, and it may produce better results (e.g. no information loss from Han unification), but it can be a lot more work. -- http://mail.python.org/mailman/listinfo/python-list