On Fri, Mar 11, 2016 at 1:22 AM, BartC <b...@freeuk.com> wrote: >> >> 1) Unicode support, intrinsic to the language, is crucial, even if >> BartC refuses to recognize this. Anything released beyond the confines >> of his personal workspace will need full Unicode support, otherwise it >> is a problem to the rest of the world, and should be destroyed with >> fire. Thanks. > > > I don't agree. If I distribute some text in the form of a series of ASCII > byte values (eg. classic TXT format, with either kind of line separator), > then that same data can be directly interpreted as UTF-8.
What you call "classic TXT format" is still an encoding, which means you're acknowledging the difference between characters and bytes - that's the first step. But you have to be certain that you are interpreting it as UTF-8, in which case ASCII ceases to be significant, and what you've done is declare that your file consists of a stream of UTF-8-encoded Unicode characters, divided into lines with either U+000D U+000A or just U+000A. That's a nice clear encoding definition. And the difference between characters and bytes is only the first step (albeit the biggest and most important step). You _need_ to make sure that you're thinking about text as text, and that means being aware of RTL vs LTR, combining characters, case conversions, collations, etc, etc, etc, all in terms of Unicode rather than as eight-bit or seven-bit characters. (For example, a naïve MUD client might assume that one byte is one character is 8 pixels of width. I know this, because some years ago I wrote one exactly like that (well, the figure "8" came from measuring the current font, but other than at font changes, it was fixed). An intelligent Unicode-aware MUD client has to not only cope with variable width, but also characters that don't have any width at all, and those that use the same space as their base character, and those that are placed to the left of the preceding character.) You can't ignore this, although you might be able to leave full support for later - but it's a bug until you do. ChrisA -- https://mail.python.org/mailman/listinfo/python-list